less than 1 minute read

  • Time: Wednesday 09/27/2023 from 11:30 AM to 12:20 PM
  • Location: BLOC 448
  • Pizza and drinks provided

Topic

Model-free Full Reinforcement Learning for Learning the Optimal Treatment Policy

Abstract

During the treatment of chronic diseases, such as cancer, sepsis, and diabetes, patients may receive treatments multiple times. Our aim is to learn the best sequence of treatments, also called a treatment policy or dynamic treatment regimes, using already available patient data. Although the DTR learning problem is an offline reinforcement learning (RL) problem, most standard offline RL methods are unsuitable for the DTR setting due to their inability to leverage the whole patient history. A recent direction of DTR research has shown that efficient DTR learning is possible via direct policy search, but computationally feasible algorithms for the latter are currently available only when there are only two treatment options. This project aims to develop a direct policy search framework for DTR problems for general cases. The proposed methods are either model-free or robust to misspecification and reduce to non-convex but smooth optimization problems. I will introduce the proposed method, show some initial experimental and theoretical guarantees, and discuss the open questions.

Presentation

Recording

Categories:

Updated: