Every Bit Counts: Using Deep Learning and Vectorization to Analyze Healthcare Big Data

24 Pages Posted: 30 Apr 2019

See all articles by Weiguang Wang

Weiguang Wang

University of Maryland, Robert H. Smith School of Business

Min (Michelle) Chen

Florida International University (FIU)

Guodong (Gordon) Gao

University of Maryland - R.H. Smith School of Business

Jeffrey McCullough

University of Michigan

Date Written: April 27, 2019

Abstract

The rapid digitization of healthcare has generated large volumes of rich and complex data from sources such as claims and electronic health records. Traditional analytic approaches, however, only utilize small subsets of these data and often require deep domain knowledge. New methods are needed to handle both the scale and complexity of big data in healthcare. We employ a recent breakthrough in computer science to develop a new Deep Learning-based Vectorization (DLV) approach for more comprehensive and efficient analysis of healthcare data. This approach automatically converts data elements into standardized numeric vectors, empowering new types of computing and improve performance in traditional data analysis. We demonstrate the potential of DLV to predict 30-day readmission using discharge records that cover all emergency department visits and inpatient hospitalizations in Florida. We find that while traditional approaches struggle even to load large amounts of clinical information (including non-numeric variables), DLV handles this information easily. Furthermore, DLV significantly improves the accuracy of 30-day readmission prediction in the presence of high-dimensional data. While traditional logistic regression yields an AUC of 0.61, DLV gives us 0.79, a 163% improvement in the lift over the baseline of 0.50. In addition, we demonstrate that the vector representations offered by DLV afford easy visualization for better understanding of the clinical data. Overall, the DLV approach thus shows great potential in facilitating the analysis of big healthcare data and can complement traditional methods in high-dimensional environments.

Suggested Citation

Wang, Weiguang and Chen, Min (Michelle) and Gao, Guodong (Gordon) and McCullough, Jeffrey, Every Bit Counts: Using Deep Learning and Vectorization to Analyze Healthcare Big Data (April 27, 2019). Available at SSRN: https://ssrn.com/abstract=3378896 or http://dx.doi.org/10.2139/ssrn.3378896

Weiguang Wang (Contact Author)

University of Maryland, Robert H. Smith School of Business ( email )

College Park, MD
United States

Min (Michelle) Chen

Florida International University (FIU) ( email )

University Park
11200 SW 8th Street
Miami, FL 33199
United States

Guodong (Gordon) Gao

University of Maryland - R.H. Smith School of Business ( email )

4325 Van Munching Hall
College Park, MD 20742
United States

HOME PAGE: http://www.rhsmith.umd.edu/faculty/ggao/

Jeffrey McCullough

University of Michigan ( email )

1415 Washington Heights
SPH II
Ann Arbor, MI 48109
United States
7349361189 (Phone)
7347644338 (Fax)

HOME PAGE: http://https://sph.umich.edu/faculty-profiles/mccullough-jeffrey.html

Here is the Coronavirus
related research on SSRN

Paper statistics

Downloads
60
Abstract Views
346
rank
380,383
PlumX Metrics