A Comprehensive Assessment of Census Record Linking Methods: Comparing Deterministic, Probabilistic, and Machine Learning Approaches

28 Pages Posted: 13 Oct 2022

See all articles by Fangqi Wen

Fangqi Wen

Australian National University (ANU) - College of Asia and the Pacific

Jung In

University of Oxford - Nuffield College

Richard J. Breen

University of Oxford - Nuffield College

Date Written: October 8, 2022

Abstract

19th century Britain should be of great interest to students of intergenerational social mobility and census data provide the best way of gaining comprehensive and detailed information about mobility in this period. We investigate a range of methods for linking observations across censuses of England and Wales: exact matching, deterministic non-exact matching, probabilistic matching, and machine learning (ML) approaches. We draw 50 samples of 2% of men aged 0 – 19 from the 1851 census and try to link them to men aged 30-49 in the 1881 census. We derive the upper bounds on the matching rate and we evaluate the methods in terms of their matching rate and the representativeness of the matched data. We compare measures of absolute and relative mobility estimated using samples derived from these different matching methods, and the substantive results turn out to be largely similar. We then use our preferred methods to perform full census linking. Probabilistic matching using the fastLink method (Enamorado, et al 2019) performs very well, being slightly better than the deterministic ABE method (Abramitzky, Boustan, and Eriksson, 2012, 2014, 2019). Of the ML methods we consider, Random Forest performs well on the 2% samples but is still outperformed by fastLink in full-census matching.

Keywords: Historical data, historical census, intergenerational mobility, record linking, Victorian Britain

Suggested Citation

Wen, Fangqi and In, Jung and Breen, Richard J., A Comprehensive Assessment of Census Record Linking Methods: Comparing Deterministic, Probabilistic, and Machine Learning Approaches (October 8, 2022). Available at SSRN: https://ssrn.com/abstract=4241435 or http://dx.doi.org/10.2139/ssrn.4241435

Fangqi Wen (Contact Author)

Australian National University (ANU) - College of Asia and the Pacific ( email )

Australia

Jung In

University of Oxford - Nuffield College ( email )

Oxford
United Kingdom

Richard J. Breen

University of Oxford - Nuffield College ( email )

Manor Road
Manor Road
Oxford, OX1 3UQ
United Kingdom
+44 1865 278538 (Phone)
+44 1865 278500 (Fax)

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
31
Abstract Views
167
PlumX Metrics