How Well Do Automated Linking Methods Perform? Lessons from U.S. Historical Data

67 Pages Posted: 20 Nov 2017 Last revised: 17 Feb 2023

See all articles by Martha J. Bailey

Martha J. Bailey

University of California, Los Angeles (UCLA) - Department of Economics

Connor Cole

University of Michigan at Ann Arbor - Department of Economics

Morgan Henderson

University of Michigan at Ann Arbor - Department of Economics

Catherine Massey

University of Michigan at Ann Arbor - Institute for Social Research

Date Written: November 2017

Abstract

This paper reviews the literature in historical record linkage in the U.S. and examines the performance of widely-used automated record linking algorithms in two high-quality historical datasets and one synthetic ground truth. Focusing on algorithms in current practice, our findings highlight the important effects of linking methods on data quality. We find that (1) no method (including hand-linking) consistently produces representative samples; (2) 15 to 37 percent of links chosen by prominent machine linking algorithms are identified as false links by human reviewers; and (3) these false links are systematically related to baseline sample characteristics, suggesting that machine algorithms may introduce complicated forms of bias into analyses. We find that prominent linking algorithms attenuate estimates of the intergenerational income elasticity by up to 20 percent and common variations in algorithm choices result in greater attenuation. These results recommend that current practice could be improved by placing more emphasis on reducing false links and less emphasis on increasing match rates. We conclude with constructive suggestions for reducing linking errors and directions for future research.

Suggested Citation

Bailey, Martha Jane and Cole, Connor and Henderson, Morgan and Massey, Catherine, How Well Do Automated Linking Methods Perform? Lessons from U.S. Historical Data (November 2017). NBER Working Paper No. w24019, Available at SSRN: https://ssrn.com/abstract=3074197

Martha Jane Bailey (Contact Author)

University of California, Los Angeles (UCLA) - Department of Economics ( email )

8283 Bunche Hall
Los Angeles, CA 90095-1477
United States

Connor Cole

University of Michigan at Ann Arbor - Department of Economics ( email )

Ann Arbor, MI
United States

Morgan Henderson

University of Michigan at Ann Arbor - Department of Economics ( email )

Ann Arbor, MI
United States

Catherine Massey

University of Michigan at Ann Arbor - Institute for Social Research ( email )

426 Thompson St.
Ann Arbor, MI 48104
United States

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
52
Abstract Views
317
Rank
687,751
PlumX Metrics