Every Bit Counts: Using Deep Learning and Vectorization to Analyze Healthcare Big Data
24 Pages Posted: 30 Apr 2019
Date Written: April 27, 2019
The rapid digitization of healthcare has generated large volumes of rich and complex data from sources such as claims and electronic health records. Traditional analytic approaches, however, only utilize small subsets of these data and often require deep domain knowledge. New methods are needed to handle both the scale and complexity of big data in healthcare. We employ a recent breakthrough in computer science to develop a new Deep Learning-based Vectorization (DLV) approach for more comprehensive and efficient analysis of healthcare data. This approach automatically converts data elements into standardized numeric vectors, empowering new types of computing and improve performance in traditional data analysis. We demonstrate the potential of DLV to predict 30-day readmission using discharge records that cover all emergency department visits and inpatient hospitalizations in Florida. We find that while traditional approaches struggle even to load large amounts of clinical information (including non-numeric variables), DLV handles this information easily. Furthermore, DLV significantly improves the accuracy of 30-day readmission prediction in the presence of high-dimensional data. While traditional logistic regression yields an AUC of 0.61, DLV gives us 0.79, a 163% improvement in the lift over the baseline of 0.50. In addition, we demonstrate that the vector representations offered by DLV afford easy visualization for better understanding of the clinical data. Overall, the DLV approach thus shows great potential in facilitating the analysis of big healthcare data and can complement traditional methods in high-dimensional environments.
Suggested Citation: Suggested Citation