Entity Matching with Similarity Encoding: A Supervised Learning Recommendation Framework for Linking (Big) Data

31 Pages Posted: 15 Aug 2023

See all articles by Pantelis Karapanagiotis

Pantelis Karapanagiotis

University of Groningen, Department of Operations; Leibniz Institute for Financial Research SAFE

Marius Liebald

Goethe University Frankfurt

Date Written: August 15, 2023

Abstract

In this study, we introduce a novel entity matching (EM) framework. It com-bines state-of-the-art EM approaches based on Artificial Neural Networks (ANN) with a new similarity encoding derived from matching techniques that are preva-lent in finance and economics. Our framework is on-par or outperforms alternative end-to-end frameworks in standard benchmark cases. Because similarity encod-ing is constructed using (edit) distances instead of semantic similarities, it avoids out-of-vocabulary problems when matching dirty data. We highlight this property by applying an EM application to dirty financial firm-level data extracted from historical archives.

Keywords: Entity matching, Entity resolution, Database linking, Machine learning, Record resolution, Similarity encoding

JEL Classification: C8

Suggested Citation

Karapanagiotis, Pantelis and Liebald, Marius, Entity Matching with Similarity Encoding: A Supervised Learning Recommendation Framework for Linking (Big) Data (August 15, 2023). SAFE Working Paper No. 398, Available at SSRN: https://ssrn.com/abstract=4541376 or http://dx.doi.org/10.2139/ssrn.4541376

Pantelis Karapanagiotis (Contact Author)

University of Groningen, Department of Operations ( email )

Groningen
Netherlands

Leibniz Institute for Financial Research SAFE ( email )

House of Finance
Theodor-W.-Adorno-Platz 3
Frankfurt, 60323
Germany

Marius Liebald

Goethe University Frankfurt ( email )

Grüneburgplatz 1
Frankfurt am Main, 60323
Germany

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
168
Abstract Views
776
Rank
383,998
PlumX Metrics