Private Genetic Genealogy Search
48 Pages Posted: 8 Jul 2021
Date Written: June 28, 2021
Genetic genealogy search has emerged as a powerful technique for identifying individuals by leveraging their genetic information and a genealogical network. The current practice relies on searching within a pre-constructed database containing genetic data from many individuals, and as such exposes those in the database to substantial privacy risks. Motivated by these privacy concerns, we propose a framework of genealogy search that takes into account the amount of privacy exposure. In contrast to the existing static approach of collecting a large amount of genetic data beforehand, we advocate for a new search paradigm whereby genetic samples are accessed in a sequential manner. Our results show that carefully designed sequential search procedures can significantly outperform existing static approaches in terms of the trade-off between cost and privacy exposure. We further characterize the optimal trade-off, and propose a family of search strategies that provably achieve the it over path- and grid-like networks. Finally, we validate our findings via numerical experiments on both real and synthetic genealogical networks and discuss the policy implications of our results.
Keywords: genetic privacy, optimal stopping, graph search, long-range familial search.
Suggested Citation: Suggested Citation