19 Pages Posted: 4 Jun 2012 Last revised: 3 Sep 2015
Date Written: July 2012
The 1997 re-identification of Massachusetts Governor William Weld’s medical data within an insurance data set which had been stripped of direct identifiers has had a profound impact on the development of de-identification provisions within the 2003 Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. Weld’s re-identification, purportedly achieved through the use of a voter registration list from Cambridge, MA is frequently cited as an example that computer scientists can re-identify individuals within de-identified data with “astonishing ease”. However, a careful re-examination of the population demographics in Cambridge indicates that Weld was most likely re-identifiable only because he was a public figure who experienced a highly publicized hospitalization rather than there being any certainty underlying his re-identification using the Cambridge voter data, which had missing data for a large proportion of the population.
The complete story of Weld's re-identification exposes an important systemic barrier to accurate re-identification known as “the myth of the perfect population register”. Because the logic underlying re-identification depends critically on being able to demonstrate that a person within health data set is the only person in the larger population who has a set of combined characteristics (known as “quasi-identifiers”) that could potentially re-identify them, most re-identification attempts face a strong challenge in being able to create a complete and accurate population register. This strong limitation not only underlies the entire set of famous Cambridge re-identification results but also impacts much of the existing re-identification research cited by those making claims of easy re-identification. This paper critically examines the historic Weld re-identification and the dramatic reductions (thousands fold) of re-identification risks for de-identified health data as they have been protected by the HIPAA Privacy Rule provisions for de-identification since 2003. The paper also provides recommendations for enhancements to existing HIPAA de-identification policy, discusses critical advances routinely made in medical science and improvement of our healthcare system using de-identified data, and provides commentary on the vital importance of properly balancing the competing goals of protecting patient privacy and preserving the accuracy of scientific research and statistical analyses conducted with de-identified data.
Keywords: HIPAA, privacy, de-identification, statistical disclosure, population register, quasi-identifiers, K-anonymity, public policy
Suggested Citation: Suggested Citation
Barth-Jones, Daniel C., The 'Re-Identification' of Governor William Weld's Medical Information: A Critical Re-Examination of Health Data Identification Risks and Privacy Protections, Then and Now (July 2012). Available at SSRN: https://ssrn.com/abstract=2076397 or http://dx.doi.org/10.2139/ssrn.2076397