TDGen: A Test Data Generator for Evaluating Record Linkage Methods

NO. WP-GRLC-2012-01

16 Pages Posted: 31 Mar 2020

See all articles by Tobias Bachteler

Tobias Bachteler

University of Duisburg-Essen

Jörg Reiher

University of Duisburg-Essen

Date Written: July 2, 2012

Abstract

The record linkage field is notoriously short of appropriate test data for methodological research on linkage algorithms and procedures. Since there are few real world data sets publicly available, research on record linkage methodology must often be based on artificially generated test decks. Our record linkage test data generator, TDGen for short, is designed to take a test data file and to provide a garbled version of it by introducing simulated errors.

Keywords: test data, simulated data, record linkage

Suggested Citation

Bachteler, Tobias and Reiher, Jörg, TDGen: A Test Data Generator for Evaluating Record Linkage Methods (July 2, 2012). NO. WP-GRLC-2012-01, Available at SSRN: https://ssrn.com/abstract=3549240

Tobias Bachteler (Contact Author)

University of Duisburg-Essen

Lotharstrasse 1
Duisburg, 47048
Germany

Jörg Reiher

University of Duisburg-Essen ( email )

Lotharstrasse 1
Duisburg, 47048
Germany

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
100
Abstract Views
393
Rank
493,117
PlumX Metrics