Every Bit Counts: Using Deep Learning and Vectorization to Analyze Healthcare Big Data

24 Pages Posted: 30 Apr 2019

See all articles by Weiguang Wang

Weiguang Wang

Johns Hopkins University

Min (Michelle) Chen

Florida International University (FIU)

Guodong (Gordon) Gao

Johns Hopkins University - Carey Business School

Jeffrey McCullough

University of Michigan

Date Written: April 27, 2019

Abstract

The rapid digitization of healthcare has generated large volumes of rich and complex data from sources such as claims and electronic health records. Traditional analytic approaches, however, only utilize small subsets of these data and often require deep domain knowledge. New methods are needed to handle both the scale and complexity of big data in healthcare. We employ a recent breakthrough in computer science to develop a new Deep Learning-based Vectorization (DLV) approach for more comprehensive and efficient analysis of healthcare data. This approach automatically converts data elements into standardized numeric vectors, empowering new types of computing and improve performance in traditional data analysis. We demonstrate the potential of DLV to predict 30-day readmission using discharge records that cover all emergency department visits and inpatient hospitalizations in Florida. We find that while traditional approaches struggle even to load large amounts of clinical information (including non-numeric variables), DLV handles this information easily. Furthermore, DLV significantly improves the accuracy of 30-day readmission prediction in the presence of high-dimensional data. While traditional logistic regression yields an AUC of 0.61, DLV gives us 0.79, a 163% improvement in the lift over the baseline of 0.50. In addition, we demonstrate that the vector representations offered by DLV afford easy visualization for better understanding of the clinical data. Overall, the DLV approach thus shows great potential in facilitating the analysis of big healthcare data and can complement traditional methods in high-dimensional environments.

Suggested Citation

Wang, Weiguang and Chen, Min (Michelle) and Gao, Guodong (Gordon) and McCullough, Jeffrey, Every Bit Counts: Using Deep Learning and Vectorization to Analyze Healthcare Big Data (April 27, 2019). Available at SSRN: https://ssrn.com/abstract=3378896 or http://dx.doi.org/10.2139/ssrn.3378896

Weiguang Wang (Contact Author)

Johns Hopkins University ( email )

Baltimore, MD 20036-1984
United States

Min (Michelle) Chen

Florida International University (FIU) ( email )

University Park
11200 SW 8th Street
Miami, FL 33199
United States

Guodong (Gordon) Gao

Johns Hopkins University - Carey Business School ( email )

100 International Drive
Baltimore, MD 21202
United States

Jeffrey McCullough

University of Michigan ( email )

1415 Washington Heights
SPH II
Ann Arbor, MI 48109
United States
7349361189 (Phone)
7347644338 (Fax)

HOME PAGE: http://https://sph.umich.edu/faculty-profiles/mccullough-jeffrey.html

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
170
Abstract Views
1,124
Rank
372,915
PlumX Metrics