Safe Reinforcement Learning with Contextual Information: Theory and Application to Personalized Comorbidity Management

40 Pages Posted: 2 Oct 2023

See all articles by Junyu Cao

Junyu Cao

University of Texas at Austin - Red McCombs School of Business

Esmaeil Keyvanshokooh

Department of Information & Operations Management, Mays Business School, Texas A&M University, College Station, TX

Tian Liu

Texas A & M University

Date Written: September 25, 2023

Abstract

Optimizing the treatment regimen is a crucial sequential medical decision-making problem. Inspired by this problem, we study a sequential decision-making setting of learning a personalized, safe control policy that maximizes an objective function subject to safety constraints that need to be satisfied during the learning process. We formulate this setting as a contextual constrained Markov decision process with unknown transition kernel, reward, and constraint functions. We develop a model-based reinforcement learning (RL) algorithm that accounts for (i) personalization, (ii) safe exploration, and (iii) general statistical models for modeling uncertainty. We conduct a rigorous regret analysis of this algorithm by seamlessly synthesizing online RL theory with statistical machine learning and optimization techniques, and prove that it admits a sub-linear regret without violating safety constraints during its learning phase. To corroborate our theoretical findings, we utilize a granular clinical dataset of patients with co-morbid type 2 diabetes and hypertension, and at elevated risk for atherosclerotic cardiovascular diseases (ASCVD). Our analysis indicates that our approach holds the potential to surpass current practice. We further benchmark several policies to underscore the advantage of our method and provide several insights. Notably, while mitigating ASCVD risks for all patients, our algorithm shows particular efficacy for patients with intricate health profiles, including those with a history of ASCVD events or smoking habits. Beyond the medical domain, our work possesses versatile applicability across a spectrum of domains where safety and personalization are essential considerations.

Note:

Funding Information: There is no funding for this research.

Conflict of Interests: The authors acknowledge there are no conflicts of interest.

Ethical Approval: The data used in this research was approved under IRB 2022-1303 by Texas A&M University.

Keywords: online learning algorithms, safe reinforcement learning, personalized comorbidity disease management, regret analysis, optimism and pessimism

Suggested Citation

Cao, Junyu and Keyvanshokooh, Esmaeil and Liu, Tian, Safe Reinforcement Learning with Contextual Information: Theory and Application to Personalized Comorbidity Management (September 25, 2023). Available at SSRN: https://ssrn.com/abstract=4583667 or http://dx.doi.org/10.2139/ssrn.4583667

Junyu Cao (Contact Author)

University of Texas at Austin - Red McCombs School of Business ( email )

Austin, TX
United States

Esmaeil Keyvanshokooh

Department of Information & Operations Management, Mays Business School, Texas A&M University, College Station, TX ( email )

210 Olsen Blvd
College Station, TX 77843
United States

Tian Liu

Texas A & M University ( email )

College Station, TX
United States

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
68
Abstract Views
259
Rank
577,446
PlumX Metrics