13 Pages Posted: 5 Feb 2002
Date Written: 2002
In data mining applications, the availability of data is often a serious problem. For instance, elementary customer information resides in customer databases, but market survey data are only available for a subset of the customers or even for a different sample of customers. Data fusion provides a way out by combining information from different sources into a single data set for further data mining. While a significant amount of work has been done on data fusion in the past, most of the research has been performed outside of the data mining community. In this paper, we provide an overview of data fusion, introduce basic terminology and the statistical matching approach, distinguish between internal and external evaluation, and we conclude with a larger case study.
Keywords: Data Mining, Data Fusion, Leveraging of Sample Data
Suggested Citation: Suggested Citation
van der Putten, Peter and Kok, Joost N. and Gupta, Amar, Data Fusion through Statistical Matching (2002). MIT Sloan Working Paper No. 4342-02; Eller College Working Paper No. 1031-05. Available at SSRN: https://ssrn.com/abstract=297501 or http://dx.doi.org/10.2139/ssrn.297501