The Use and Misuse of Biomedical Data: Is Bigger Really Better?
39 American Journal of Law & Medicine 497 (213)
43 Pages Posted: 19 Mar 2013 Last revised: 5 Feb 2014
Date Written: November 26, 2013
Very large biomedical research databases, containing electronic health records (EHR) and genomic data from millions of patients, have been heralded recently for their potential to accelerate scientific discovery and produce dramatic improvements in medical treatments. Research enabled by these databases may also lead to profound changes in law, regulation, social policy, and even litigation strategies. Yet, is “big data” necessarily better data?
This paper makes an original contribution to the legal literature by focusing on what can go wrong in the process of biomedical database research and what precautions are necessary to avoid critical mistakes. We address three main reasons for a cautious approach to such research and to relying on its outcomes for purposes of public policy or litigation. First, the data contained in databases is surprisingly likely to be incorrect or incomplete. Second, systematic biases, arising from both the nature of the data and the preconceptions of investigators, are serious threats to the validity of biomedical database research, especially in answering causal questions. Third, data mining of biomedical databases makes it easier for individuals with political, social, or economic agendas to generate ostensibly scientific but misleading research findings for the purpose of manipulating public opinion and swaying policy makers.
In short, this paper sheds much-needed light on the problems of credulous and uninformed uses of biomedical databases. An understanding of the pitfalls of big data analysis is of critical importance to anyone who will rely on or dispute its outcomes, including lawyers, policy makers, and the public at large. The article also recommends technical, methodological, and educational interventions to combat the dangers of database errors and abuses.
Keywords: electronic health records, biomedical research databases, health informatics, data errors, selection bias, measurement bias, confounding bias, causal inference techniques, data quality assessment, limitations of technology, data mining
JEL Classification: K32
Suggested Citation: Suggested Citation