Q(σ) learning in multi-policy multi-objective reinforcement learning

Project Title:

Q(σ) learning in multi-policy multi-objective Reinforcement Learning

Supervisor(s):

Assoc Prof Peter Vamplew, Dr Cameron Foale; Co-supervisor Assoc Prof

Richard Dazeley (Deakin)

Contact person and email address:

Assoc Prof Peter Vamplew (p.vamplew@federation.edu.au)

A brief description of the project:

Multi-objective reinforcement learning (MORL) aims to find a policy or set of pareto optimal policies that offer a trade-off between multiple conflicting objectives. There are two classes of MORL approaches: single-policy and multi-policy. In multi-policy approaches the aim is to learn the full convex coverage set usually in parallel. This parallelisation in on-line environment is often achieved using off-policy based learning such as Q(λ) learning.

Recently, De Asis et al (2018) presented an algorithm, called Q(σ), that unified the dispirit Q(λ) and Sarsa reinforcement learning methods. The new Q(σ) algorithm presents a new possible approach to performing multi-policy learning. This project will implement and compare the learning speed and stability of solution both Q(λ) and Q(σ) algorithms in a range of benchmark problem environments.

Explore courses

Course levels

Information for

Admission information

Life at Federation

Before you apply

How to apply

After you apply

Research support

Our research

Engage

Give

Industry

Partnerships

About us

Our vision and strategy

Our structure

Commercial Services

Study

Apply

Research

Engage

About

Q(σ) learning in multi-policy multi-objective reinforcement learning