Safe Reinforcement Learning with Contextual Information: Theory and Application to Personalized Comorbidity Management
40 Pages Posted: 2 Oct 2023
Date Written: September 25, 2023
Optimizing the treatment regimen is a crucial sequential medical decision-making problem. Inspired by this problem, we study a sequential decision-making setting of learning a personalized, safe control policy that maximizes an objective function subject to safety constraints that need to be satisfied during the learning process. We formulate this setting as a contextual constrained Markov decision process with unknown transition kernel, reward, and constraint functions. We develop a model-based reinforcement learning (RL) algorithm that accounts for (i) personalization, (ii) safe exploration, and (iii) general statistical models for modeling uncertainty. We conduct a rigorous regret analysis of this algorithm by seamlessly synthesizing online RL theory with statistical machine learning and optimization techniques, and prove that it admits a sub-linear regret without violating safety constraints during its learning phase. To corroborate our theoretical findings, we utilize a granular clinical dataset of patients with co-morbid type 2 diabetes and hypertension, and at elevated risk for atherosclerotic cardiovascular diseases (ASCVD). Our analysis indicates that our approach holds the potential to surpass current practice. We further benchmark several policies to underscore the advantage of our method and provide several insights. Notably, while mitigating ASCVD risks for all patients, our algorithm shows particular efficacy for patients with intricate health profiles, including those with a history of ASCVD events or smoking habits. Beyond the medical domain, our work possesses versatile applicability across a spectrum of domains where safety and personalization are essential considerations.
Funding Information: There is no funding for this research.
Conflict of Interests: The authors acknowledge there are no conflicts of interest.
Ethical Approval: The data used in this research was approved under IRB 2022-1303 by Texas A&M University.
Keywords: online learning algorithms, safe reinforcement learning, personalized comorbidity disease management, regret analysis, optimism and pessimism
Suggested Citation: Suggested Citation