header

An Unsupervised Instance Matcher for Schema-Free RDF Data

26 Pages Posted: 17 Jan 2020 Publication Status: Accepted

See all articles by Mayank Kejriwal

Mayank Kejriwal

University of Texas at Austin - Department of Computer Science

Daniel P. Miranker

University of Texas at Austin - Department of Computer Science

Abstract

This article presents an unsupervised system that performs instance matching between entities in schema-free Resource Description Framework (RDF) files. Rather than relying on domain expertise or manually labeled samples, the system automatically generates its own heuristic training set. The training sets are first used by the system to align the properties in the input graphs. The property alignment and training sets are used together to simultaneously learn two functions, one for the blocking step of instance matching and the other for the classification step. Finally, the learned functions are used to perform instance matching. The full system is implemented as a sequence of components that can be iteratively executed to boost performance. Evaluations on a suite of ten test cases show individual components to be competitive with state-of-the-art baselines. The system as a whole is shown to compete effectively with adaptive supervised approaches.

Keywords: Instance Matching, Unsupervised System, Schema-free data, Linked Data, Automatic Training Set Generation, Feature Selection, Property Alignment, Modularity

Suggested Citation

Kejriwal, Mayank and Miranker, Daniel P., An Unsupervised Instance Matcher for Schema-Free RDF Data (2015). Available at SSRN: https://ssrn.com/abstract=3198896 or http://dx.doi.org/10.2139/ssrn.3198896

Mayank Kejriwal (Contact Author)

University of Texas at Austin - Department of Computer Science ( email )

2317 Speedway, Stop D9500
Austin, TX
United States

Daniel P. Miranker

University of Texas at Austin - Department of Computer Science ( email )

2317 Speedway, Stop D9500
Austin, TX
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
84
Abstract Views
784
PlumX Metrics