Data Fusion through Statistical Matching

13 Pages Posted: 5 Feb 2002  

Peter van der Putten

Leiden University - Department of Mathematics and Computer Science

Joost N. Kok

Leiden University - Department of Mathematics and Computer Science

Amar Gupta

Massachusetts Institute of Technology (MIT)

Date Written: 2002

Abstract

In data mining applications, the availability of data is often a serious problem. For instance, elementary customer information resides in customer databases, but market survey data are only available for a subset of the customers or even for a different sample of customers. Data fusion provides a way out by combining information from different sources into a single data set for further data mining. While a significant amount of work has been done on data fusion in the past, most of the research has been performed outside of the data mining community. In this paper, we provide an overview of data fusion, introduce basic terminology and the statistical matching approach, distinguish between internal and external evaluation, and we conclude with a larger case study.

Keywords: Data Mining, Data Fusion, Leveraging of Sample Data

Suggested Citation

van der Putten, Peter and Kok, Joost N. and Gupta, Amar, Data Fusion through Statistical Matching (2002). MIT Sloan Working Paper No. 4342-02; Eller College Working Paper No. 1031-05. Available at SSRN: https://ssrn.com/abstract=297501 or http://dx.doi.org/10.2139/ssrn.297501

Peter Van der Putten

Leiden University - Department of Mathematics and Computer Science ( email )

Niels Bohrweg 1
2333 CA Leiden
Netherlands

Joost N. Kok

Leiden University - Department of Mathematics and Computer Science ( email )

Niels Bohrweg 1
2333 CA Leiden
Netherlands

Amar Gupta (Contact Author)

Massachusetts Institute of Technology (MIT) ( email )

77 Massachusetts Avenue
Building 32-256
Cambridge, MA 02139
United States
617-253-0418 (Phone)

Paper statistics

Downloads
993
Rank
17,121
Abstract Views
6,604