header

An Unsupervised Instance Matcher for Schema-Free RDF Data

26 Pages Posted: 23 Jun 2018 First Look: Accepted

See all articles by Mayank Kejriwal

Mayank Kejriwal

University of Texas at Austin - Department of Computer Science

Daniel P. Miranker

University of Texas at Austin - Department of Computer Science

Abstract

This article presents an unsupervised system that performs instance matching between entities in schema-free Resource Description Framework (RDF) files. Rather than relying on domain expertise or manually labeled samples, the system automatically generates its own heuristic training set. The training sets are first used by the system to align the properties in the input graphs. The property alignment and training sets are used together to simultaneously learn two functions, one for the blocking step of instance matching and the other for the classification step. Finally, the learned functions are used to perform instance matching. The full system is implemented as a sequence of components that can be iteratively executed to boost performance. Evaluations on a suite of ten test cases show individual components to be competitive with state-of-the-art baselines. The system as a whole is shown to compete effectively with adaptive supervised approaches.

Keywords: Instance Matching, Unsupervised System, Schema-free data, Linked Data, Automatic Training Set Generation, Feature Selection, Property Alignment, Modularity

Suggested Citation

Kejriwal, Mayank and Miranker, Daniel P., An Unsupervised Instance Matcher for Schema-Free RDF Data (2015). Journal of Web Semantics First Look. Available at SSRN: https://ssrn.com/abstract=3198896 or http://dx.doi.org/10.2139/ssrn.3198896

Mayank Kejriwal (Contact Author)

University of Texas at Austin - Department of Computer Science ( email )

2317 Speedway, Stop D9500
Austin, TX
United States

Daniel P. Miranker

University of Texas at Austin - Department of Computer Science ( email )

2317 Speedway, Stop D9500
Austin, TX
United States

Register to save articles to
your library

Register

Paper statistics

Abstract Views
181
Downloads
14