header

Scalable and Distributed Methods for Entity Matching, Consolidation and Disambiguation Over Linked Data Corpora

59 Pages Posted: 7 Jul 2018 First Look: Accepted

See all articles by Aidan Hogan

Aidan Hogan

National University of Ireland, Galway (NUIG) - Digital Enterprise Research Institute (DERI)

Antoine Zimmermann

National Institute of Applied Sciences of Lyon (INSA)

Jürgen Umbrich

National University of Ireland, Galway (NUIG) - Digital Enterprise Research Institute (DERI)

Axel Polleres

Siemens AG - Österreich

Stefan Decker

National University of Ireland, Galway (NUIG) - Digital Enterprise Research Institute (DERI)

Abstract

With respect to large-scale, static, Linked Data corpora, in this paper we discuss scalable and distributed methods for entity consolidation (aka. smushing, entity resolution, object consolidation, etc.) to locate and process names that signify the same entity. We investigate (i) a baseline approach, which uses explicit owl: sameAs relations to perform consolidation; (ii) extended entity consolidation which additionally uses a subset of OWL 2 RL/RDF rules to derive novel owl: sameAs relations through the semantics of inverse-functional properties, functional-properties and (max-) cardinality restrictions with value one; (iii) deriving weighted concurrence measures between entities in the corpus based on shared inlinks/outlinks and attribute values using statistical analyses; (iv) disambiguating (initially) consolidated entities based on inconsistency detection using OWL 2 RL/RDF rules. Our methods are based upon distributed sorts and scans of the corpus, where we deliberately avoid the requirement for indexing all data. Throughout, we offer evaluation over a diverse Linked Data corpus consisting of 1.118 billion quadruples derived from a domain-agnostic, open crawl of 3.985 million RDF/XML Web documents, demonstrating the feasibility of our methods at that scale, and giving insights into the quality of the results for real-world data.

Keywords: Linked Data, Entity Consolidation, Distributed Consolidation, Entity Matching, Owl, Instance Matching

Suggested Citation

Hogan, Aidan and Zimmermann, Antoine and Umbrich, Jürgen and Polleres, Axel and Decker, Stefan, Scalable and Distributed Methods for Entity Matching, Consolidation and Disambiguation Over Linked Data Corpora (2012). Journal of Web Semantics First Look 10_0_5. Available at SSRN: https://ssrn.com/abstract=3198933 or http://dx.doi.org/10.2139/ssrn.3198933

Aidan Hogan (Contact Author)

National University of Ireland, Galway (NUIG) - Digital Enterprise Research Institute (DERI) ( email )

University Road
Galway, Co. Kildare
Ireland

Antoine Zimmermann

National Institute of Applied Sciences of Lyon (INSA) ( email )

20 Avenue Albert Einstein
Villeurbanne
France

Jürgen Umbrich

National University of Ireland, Galway (NUIG) - Digital Enterprise Research Institute (DERI) ( email )

University Road
Galway, Co. Kildare
Ireland

Axel Polleres

Siemens AG - Österreich ( email )

Siemensstrasse 90,
Vienna
Austria

Stefan Decker

National University of Ireland, Galway (NUIG) - Digital Enterprise Research Institute (DERI) ( email )

University Road
Galway, Co. Kildare
Ireland

Here is the Coronavirus
related research on SSRN

Paper statistics

Abstract Views
280
Downloads
8