The 'Re-Identification' of Governor William Weld's Medical Information: A Critical Re-Examination of Health Data Identification Risks and Privacy Protections, Then and Now

19 Pages Posted: 4 Jun 2012 Last revised: 3 Sep 2015

See all articles by Daniel Barth-Jones

Daniel Barth-Jones

Columbia University - Mailman School of Public Health, Department of Epidemiology

Date Written: July 2012

Abstract

The 1997 re-identification of Massachusetts Governor William Weld’s medical data within an insurance data set which had been stripped of direct identifiers has had a profound impact on the development of de-identification provisions within the 2003 Health Insurance Portability and Accountability Act (HIPAA) Privacy Rule. Weld’s re-identification, purportedly achieved through the use of a voter registration list from Cambridge, MA is frequently cited as an example that computer scientists can re-identify individuals within de-identified data with “astonishing ease”. However, a careful re-examination of the population demographics in Cambridge indicates that Weld was most likely re-identifiable only because he was a public figure who experienced a highly publicized hospitalization rather than there being any certainty underlying his re-identification using the Cambridge voter data, which had missing data for a large proportion of the population.

The complete story of Weld's re-identification exposes an important systemic barrier to accurate re-identification known as “the myth of the perfect population register”. Because the logic underlying re-identification depends critically on being able to demonstrate that a person within health data set is the only person in the larger population who has a set of combined characteristics (known as “quasi-identifiers”) that could potentially re-identify them, most re-identification attempts face a strong challenge in being able to create a complete and accurate population register. This strong limitation not only underlies the entire set of famous Cambridge re-identification results but also impacts much of the existing re-identification research cited by those making claims of easy re-identification. This paper critically examines the historic Weld re-identification and the dramatic reductions (thousands fold) of re-identification risks for de-identified health data as they have been protected by the HIPAA Privacy Rule provisions for de-identification since 2003. The paper also provides recommendations for enhancements to existing HIPAA de-identification policy, discusses critical advances routinely made in medical science and improvement of our healthcare system using de-identified data, and provides commentary on the vital importance of properly balancing the competing goals of protecting patient privacy and preserving the accuracy of scientific research and statistical analyses conducted with de-identified data.

Keywords: HIPAA, privacy, de-identification, statistical disclosure, population register, quasi-identifiers, K-anonymity, public policy

Suggested Citation

Barth-Jones, Daniel, The 'Re-Identification' of Governor William Weld's Medical Information: A Critical Re-Examination of Health Data Identification Risks and Privacy Protections, Then and Now (July 2012). Available at SSRN: https://ssrn.com/abstract=2076397 or http://dx.doi.org/10.2139/ssrn.2076397

Daniel Barth-Jones (Contact Author)

Columbia University - Mailman School of Public Health, Department of Epidemiology ( email )

600 West 168th St., 6th Floor
New York, NY 10032
United States

HOME PAGE: http://www.mailman.columbia.edu/our-faculty/profile?uni=db2431

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
3,766
Abstract Views
34,585
Rank
5,443
PlumX Metrics