Safe Reinforcement Learning with Contextual Information: Theory and Applications
34 Pages Posted: 2 Oct 2023
Date Written: September 25, 2023
Abstract
Motivated by a critical medical decision-making problem, we study a sequential decision-making setting of learning a personalized, safe control policy that maximizes an objective function subject to safety constraints that need to be satisfied during the learning process. We formulate this setting as a contextual constrained Markov decision process model with unknown transition probabilities, reward, and constraint functions. We develop a practical and intuitive reinforcement learning (RL) algorithm that accounts for (i) personalization, (ii) safety guarantees, and (iii) general statistical models for handling uncertainty. We conduct a rigorous regret analysis of this framework by seamlessly synthesizing RL theory with statistical machine learning and optimization techniques, proving that it admits a sub-linear regret without violating safety constraints during the learning phase. Our analysis reveals a significant regret-bound improvement compared to existing theoretical results in both safe and contextual RL. To validate our theoretical findings, we use both synthetic data and a granular clinical dataset of patients with co-morbid type 2 diabetes and hypertension, who are at elevated risk for atherosclerotic cardiovascular diseases. Through extensive analyses, we highlight the superiority of our methodology over benchmark policies and current practices. Our work possesses versatile applicability across various domains where safety and personalization matter.
Note:
Funding Information: There is no funding for this research.
Conflict of Interests: The authors acknowledge there are no conflicts of interest.
Ethical Approval: The data used in this research was approved under IRB 2022-1303 by Texas A&M University.
Keywords: safe reinforcement learning, personalized medicine, regret analysis, contextual optimization
Suggested Citation: Suggested Citation