Testing Alternative Regression Frameworks for Predictive Modeling of Healthcare Costs
31 Pages Posted: 6 Mar 2015
Date Written: March 3, 2015
Predictive models of healthcare costs have become mainstream in much healthcare actuarial work. The Affordable Care Act requires the use of predictive modeling-based risk-adjuster models to transfer revenue between different health exchange participants. While the predictive accuracy of these models has been investigated in a number of studies, the accuracy and use of models for applications other than risk adjustment has not been the subject of much investigation. We investigate predictive modeling of future healthcare costs using a number of different statistical techniques. Our analysis was performed based on a dataset of 30,000 insureds containing claims information from two contiguous years. The dataset contains over a hundred covariates for each insured, including detailed breakdown of past costs and causes encoded via coexisting condition (CC) flags. We discuss statistical models for the relationship between next-year costs and medical and cost information to predict the mean and quantiles of future cost, ranking risks and identifying most predictive covariates. A comparison of multiple models is presented, including (in addition to the traditional linear regression model underlying risk adjusters) Lasso GLM, multivariate adaptive regression splines, random forests, decision trees, and boosted trees. A detailed performance analysis shows that the traditional regression approach does not perform well and that more accurate models are possible.
Suggested Citation: Suggested Citation