Preservation of Individuals’ Privacy in Shared COVID-19 Related Data

13 Pages Posted: 17 Jul 2020

See all articles by Stefan Sauermann

Stefan Sauermann

UAS Technikum Wien

Chifundo Kanjala

London School of Hygiene & Tropical Medicine - Faculty of Epidemiology and Population Health

Matthias Templ

Zurich University of Applied Sciences

Claire C. Austin

Environment and Climate Change Canada

RDA COVID-19 WG

Research Data Alliance

Date Written: July 10, 2020

Abstract

This paper provides insight into how restricted data can be incorporated in an open-be-default-by-design digital infrastructure for scientific data. We focus, in particular, on the ethical component of FAIRER (Findable, Accessible, Interoperable, Ethical, and Reproducible) data, and the pseudo-anonymization and anonymization of COVID-19 datasets to protect personally identifiable information (PII). First we consider the need for the customisation of the existing privacy preservation techniques in the context of rapid production, integration, sharing and analysis of COVID-19 data. Second, the methods for the pseudo-anonymization of direct identification variables are discussed. We also discuss different pseudo-IDs of the same person for multi-domain and multi-organization. Essentially, pseudo-anonymization and its encrypted domain specific IDs are used to successfully match data later, if required and permitted, as well as to restore the true ID (and authenticity) in individual cases of a patient's clarification.Third, we discuss application of statistical disclosure control (SDC) techniques to COVID-19 disease data. To assess and limit the risk of re-identification of individual persons in COVID-19 datasets (that are often enriched with other covariates like age, gender, nationality, etc.) to acceptable levels, the risk of successful re-identification by a combination of attribute values must be assessed and controlled. This is done using statistical disclosure control for anonymization of data. Lastly, we discuss the limitations of the proposed techniques and provide general guidelines on using disclosure risks to decide on appropriate modes for data sharing to preserve the privacy of the individuals in the datasets.

Note: Funding: None.

Conflict of interest: None to declare.

Keywords: pseudo-anonymization, statistical disclosure control, data anonymization, data sharing, privacy, personally identifiable information, PII, COVID-19, open science

Suggested Citation

Sauermann, Stefan and Kanjala, Chifundo and Templ, Matthias and Austin, Claire C. and WG, RDA COVID-19, Preservation of Individuals’ Privacy in Shared COVID-19 Related Data (July 10, 2020). Available at SSRN: https://ssrn.com/abstract=3648430 or http://dx.doi.org/10.2139/ssrn.3648430

Stefan Sauermann

UAS Technikum Wien ( email )

Chifundo Kanjala (Contact Author)

London School of Hygiene & Tropical Medicine - Faculty of Epidemiology and Population Health ( email )

London, WC1E 7HT
United Kingdom

Matthias Templ

Zurich University of Applied Sciences ( email )

Institut fuer Angewandte Medienwissenschaft
Zur Kesselschmiede 35
Winterthur, CH 8401
Switzerland

Claire C. Austin

Environment and Climate Change Canada ( email )

Gatineau
Canada

RDA COVID-19 WG

Research Data Alliance ( email )

HOME PAGE: http://rd-alliance.org/groups/rda-covid19

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
219
Abstract Views
3,273
Rank
291,681
PlumX Metrics