Q(σ) learning in multi-policy multi-objective reinforcement learning

Project Title:

Q(σ) learning in multi-policy multi-objective Reinforcement Learning

Supervisor(s):

Assoc Prof Peter Vamplew, Dr Cameron Foale; Co-supervisor Assoc Prof

Richard Dazeley (Deakin)

Contact person and email address:

Assoc Prof Peter Vamplew (p.vamplew@federation.edu.au)

A brief description of the project:

Multi-objective reinforcement learning (MORL) aims to find a policy or set of pareto optimal policies that offer a trade-off between multiple conflicting objectives. There are two classes of MORL approaches: single-policy and multi-policy. In multi-policy approaches the aim is to learn the full convex coverage set usually in parallel. This parallelisation in on-line environment is often achieved using off-policy based learning such as Q(λ) learning.

Recently, De Asis et al (2018) presented an algorithm, called Q(σ), that unified the dispirit Q(λ) and Sarsa reinforcement learning methods. The new Q(σ) algorithm presents a new possible approach to performing multi-policy learning. This project will implement and compare the learning speed and stability of solution both Q(λ) and Q(σ) algorithms in a range of benchmark problem environments.