A Framework for Reconciling Attribute Values from Multiple Data Sources

Jiang, Z., S. Sarkar, P. De and D. Dey. "A Framework for Reconciling Attribute Values from Multiple Data Sources." Management Science, Vol. 53, No. 12, December 2007, pp. 1946-1963

36 Pages Posted: 7 Feb 2018

See all articles by Zhengrui Jiang

Zhengrui Jiang

Iowa State University - College of Business

Sumit Sarkar

University of Texas at Dallas - Department of Information Systems & Operations Management

Prabuddha De

Purdue University - Krannert School of Management

Debabrata Dey

University of Kansas - School of Business

Date Written: July 25, 2017

Abstract

Because of the heterogeneous nature of different data sources, data integration is often one of the most challenging tasks in managing modern information systems. While the existing literature has focused on problems such as schema integration and entity identification, it has largely overlooked a basic question: When an attribute value for a real-world entity is recorded differently in different databases, how should the “best” value be chosen from the set of possible values? This paper provides an answer to this question. We first show how a probability distribution over a set of possible values can be derived. We then demonstrate how these probabilities can be used to solve a given decision problem, by minimizing the total cost of type I, type II, and misrepresentation errors. Finally, we propose a framework for integrating multiple data sources when a single “best” value has to be chosen and stored for every attribute of an entity.

Keywords: Data integration, heterogeneous databases, probabilistic databases, data quality, type I, type II, and misrepresentation errors

Suggested Citation

Jiang, Zhengrui and Sarkar, Sumit and De, Prabuddha and Dey, Debabrata, A Framework for Reconciling Attribute Values from Multiple Data Sources (July 25, 2017). Jiang, Z., S. Sarkar, P. De and D. Dey. "A Framework for Reconciling Attribute Values from Multiple Data Sources." Management Science, Vol. 53, No. 12, December 2007, pp. 1946-1963, Available at SSRN: https://ssrn.com/abstract=3008449

Zhengrui Jiang (Contact Author)

Iowa State University - College of Business ( email )

Ames, IA 50011-2063
United States

Sumit Sarkar

University of Texas at Dallas - Department of Information Systems & Operations Management ( email )

P.O. Box 830688
Richardson, TX 75083-0688
United States
972-883-6854 (Phone)
972-883-6811 (Fax)

Prabuddha De

Purdue University - Krannert School of Management ( email )

403 West State Street
West Lafayette, IN 47907-2056
United States
765-494-0699 (Phone)

HOME PAGE: http://www.krannert.purdue.edu/directory/bio.asp?username=pde

Debabrata Dey

University of Kansas - School of Business ( email )

Capitol Federal Hall
1654 Naismith Dr
Lawrence, KS 66045
United States
785-864-1895 (Phone)

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
24
Abstract Views
399
PlumX Metrics