A New Name-Based Sampling Method for Migrants Using N-Grams
German Record Linkage Center, Working Paper Series, No. WP-GRLC-2013-04, July 25, 2013
29 Pages Posted: 4 Mar 2020
Date Written: July 25, 2013
Abstract
The set of best methods for sampling migrant populations includes name-based sampling. So far this is done using either ad hoc lists or onomastic dictionaries for the classification of names. This paper proposes a new name-based procedure which uses a Bayes-classifier for the n-grams of the name. The new procedure is fault-tolerant of alternate spellings, and also allows the classification of names that are not found in dictionaries. It was tested using the names of about 1.600 foreigners in the PASS panel. Finally, a CATI survey based on the new method in Hesse (Germany) is described.
Keywords: onomastics, rare populations, bigrams, trigrams
Suggested Citation: Suggested Citation