Duplicate Records Detection Techniques: Issues and Illustration

18 Pages Posted: 16 Aug 2011

See all articles by Hussein Issa

Hussein Issa

Rutgers, The State University of New Jersey - Accounting & Information Systems

Miklos A. Vasarhelyi

Rutgers Business School

Date Written: August 16, 2011

Abstract

Data quality has become a key issue in computer-based management systems. Inadequate data causes serious operational difficulties as well as direct financial losses. Furthermore, databases are increasing in size at an exponential rate with the proliferation of business electronization (Vasarhelyi and Greenstein 2003), globalization, and real-time economy. Once-manual processes are now stored in electronic databases, with data from multiple sources. This heterogeneity gives rise to a new set of problems. Errors – such as incorrect data entry, incomplete information, and unstandardized formats from different data sources – can lead to the existence of more than one representation of the same real-life object in a database. This paper discusses a problem that has received little academic attention but remains of great importance to an organization’s data quality, in particular, duplicate payments. We use data from a telecommunication company to demonstrate record matching techniques.

Suggested Citation

Issa, Hussein and Vasarhelyi, Miklos A., Duplicate Records Detection Techniques: Issues and Illustration (August 16, 2011). Available at SSRN: https://ssrn.com/abstract=1910473 or http://dx.doi.org/10.2139/ssrn.1910473

Hussein Issa (Contact Author)

Rutgers, The State University of New Jersey - Accounting & Information Systems ( email )

96 New England Avenue, #18
Summit, NJ 07901-1825
United States

Miklos A. Vasarhelyi

Rutgers Business School ( email )

180 University Avenue
Ackerson Hall, Room 315
Newark, NJ 07102
United States
973-353-5002 (Phone)
973-353-1283 (Fax)

Register to save articles to
your library

Register

Paper statistics

Downloads
151
Abstract Views
1,353
rank
197,749
PlumX Metrics