Application of Federated Analytics in Health Data Research for Reducing Risks Involved in Data Sharing
24 Pages Posted: 14 Dec 2022
Abstract
Background: The use of federated networks can reduce the risk of disclosure for sensitive datasets by removing the requirement to physically transfer data. Federated networks support federated analytics, a type of privacy-enhancing technology (PET) enabling trustworthy data access and analysis.
Objectives: We aim to outline the methodology used by the International COVID-19 Data Alliance (ICODA) and its partners the Secure Anonymised Information Linkage (SAIL) Databank and Aridhia Informatics in implementing a federated network infrastructure and consequently testing federated analytics using test data provided for an ICODA exemplar project, the International Perinatal Outcome in the Pandemic (iPOP) Study. The ICODA Workbench - a trusted research environment (TRE) - was used to send federated requests to access this test data held within SAIL Databank.
Results: This project is the first example for successfully implementing a federated network for ICODA. The integration testing made use of aggregate-level data from the iPOP Study as the first step in putting in place the necessary technical and user experiences for future project studies using individual-level datasets from multiple data nodes. While the federated network was established, federated analytics was not used in the analysis of the iPOP Study due to challenges from a data standard, data governance, technology, skills and project duration perspective.
Conclusions: Creating federated networks requires an extensive amount of investment from a funding, data governance, technology, training, and people perspective. For future data scalability and providing researchers with a secure and robust data analysis platform to perform joint multi-site collaboration, establishing a federated network should be built into the medium to long term plans for study projects who are interested in using federated analytics. Federated networks have an enormous potential in bringing together national and international health care datasets and aiding the collaborative research effort within the healthcare sector to address key public health questions.
Note:
Funding Declaration: This work was supported by International COVID-19 Data Alliance (ICODA), an initiative funded by the COVID-19 Therapeutics Accelerator and convened by Health Data Research UK (HDR UK). We acknowledge funding from the Bill and Melinda Gates Foundation (INV-017293), Microsoft Artificial Intelligence (AI) for Health, and the Minderoo Foundation (INV-017293). Aridhia Informatics Ltd was funded by the Bill and Melinda Gates Foundation (INV-017293). SAIL Databank and the Secure eResearch Platform (SeRP) UK, based at Swansea University, were funded by an award from Health Data Research UK (2020.112), supported by funds from the ICODA initiative, in order to develop the underlying infrastructure and providing expertise in establishing the federated analytics platform and governance models. This study makes use of anonymised data held in the Secure Anonymised Information Linkage (SAIL) Databank. We would like to acknowledge the iPOP data providers who made their anonymised data available for research (details provided in [20] (manuscript submitted)). This project was approved by the SAIL Information Governance Review Panel, under project numbers 1292 and 1299. Helga Zoega was supported by a UNSW Scientia Program Award during the conduct of this study. Sarah J Stock was funded by a Wellcome Trust Clinical Career Development Fellowship (209560/Z/17/Z). Meghan B. Azad is supported by a Canada Research Chair in the Developmental Origins of Chronic Disease. All authors approved the version of the manuscript to be published.
Conflict of Interests: The authors declare no conflicts of interest that could have appeared to influence the work reported in this paper.
Keywords: Federated Networks, Federated Analytics, Covid-19, Health Data Research, Privacy-Preserving, Secondary Data, Data Re-use
Suggested Citation: Suggested Citation