An Unsupervised Instance Matcher for Schema-Free RDF Data
26 Pages Posted: 17 Jan 2020 First Look: Accepted
This article presents an unsupervised system that performs instance matching between entities in schema-free Resource Description Framework (RDF) files. Rather than relying on domain expertise or manually labeled samples, the system automatically generates its own heuristic training set. The training sets are first used by the system to align the properties in the input graphs. The property alignment and training sets are used together to simultaneously learn two functions, one for the blocking step of instance matching and the other for the classification step. Finally, the learned functions are used to perform instance matching. The full system is implemented as a sequence of components that can be iteratively executed to boost performance. Evaluations on a suite of ten test cases show individual components to be competitive with state-of-the-art baselines. The system as a whole is shown to compete effectively with adaptive supervised approaches.
Keywords: Instance Matching, Unsupervised System, Schema-free data, Linked Data, Automatic Training Set Generation, Feature Selection, Property Alignment, Modularity
Suggested Citation: Suggested Citation