Entity Matching with Similarity Encoding: A Supervised Learning Recommendation Framework for Linking (Big) Data
31 Pages Posted: 15 Aug 2023
Date Written: August 15, 2023
Abstract
In this study, we introduce a novel entity matching (EM) framework. It com-bines state-of-the-art EM approaches based on Artificial Neural Networks (ANN) with a new similarity encoding derived from matching techniques that are preva-lent in finance and economics. Our framework is on-par or outperforms alternative end-to-end frameworks in standard benchmark cases. Because similarity encod-ing is constructed using (edit) distances instead of semantic similarities, it avoids out-of-vocabulary problems when matching dirty data. We highlight this property by applying an EM application to dirty financial firm-level data extracted from historical archives.
Keywords: Entity matching, Entity resolution, Database linking, Machine learning, Record resolution, Similarity encoding
JEL Classification: C8
Suggested Citation: Suggested Citation