Case Study - Feature Engineering Inspired by Domain Experts on Real World Medical Data
27 Pages Posted: 22 Jul 2022
Abstract
Performing research data mining projects based on big health data produced in a daily health care environment and stored in electronic health records (EHR) can be time consuming and is limited by unstructured data and low registration quality. A key factor for successful knowledge discovery is using domain experts in close cooperation with data scientist when performing data mining. We have performed a case study on two real world medical research projects comparing feature engineering and knowledge discovery based on classification performance. The projects comprise 82,742 patients and 23,396 patients. Medical assessments and clinical values are presented in a previous study regarding project 2 where similar engineered features were used.The results show that it is valuable for medical researcher to involve a data scientist when performing medical research based on real world medical data. The findings are justified by assessing the classification performance using iteratively engineered features generated from domain experts and computer scientists in collaboration with the medical researcher. The engineered features are represented in a systematic way, which is the foundation of a theoretical model for automatic domain knowledge driven feature engineering (KDFE).The current study explains and dissects (i) the medical research process from initial research question to published paper when performing registry studies on real world medical data; (ii) answers the question why data scientist’s involvement leads to better classification; (iii) shows the benefits of KDFE; (iv) discusses ethical aspects of post-result constructed hypothesis.To our knowledge, this is the first study that by quantitative measures proves that KDFE creates knowledge performed on real world medical data.
Note:
Funding Information: This research did not receive any specific grant from funding agencies in the public, commercial, or not-for-profit sectors.
Declaration of Interests: The authors declare that they have no known competing financial interests or personal relationships that could have appeared to influence the work reported in this paper.
Keywords: Feature engineering, Medical registry research, Knowledge discovery in databases (KDD), Quantitative measures, Electronic health record (EHR), Domain knowledge
Suggested Citation: Suggested Citation