header

LHD 2.0: A Text Mining Approach to Typing Entities in Knowledge Graphs

22 Pages Posted: 3 Jul 2018 Publication Status: Accepted

See all articles by Tomas Kliegr

Tomas Kliegr

University of Economics, Prague - Department of Information and Knowledge Engineering; Queen Mary University of London - Multimedia and Vision Research Group (MMV Group)

Ondřej Zamazal

University of Economics, Prague - Department of Information and Knowledge Engineering

Abstract

The type of the entity being described is one of the key pieces of information in linked data knowledge graphs. In this article, we introduce a novel technique for type inference that extracts types from the free text description of the entity combining lexico-syntactic pattern analysis with supervised classification. For lexico-syntactic (Hearst) pattern-based extraction we use our previously published Linked Hypernyms Dataset Framework. Its output is mapped to the DBpedia Ontology with exact string matching complemented with a novel co-occurrence-based algorithm STI. This algorithm maps classes appearing in one knowledge graph to a different set of classes appearing in another knowledge graph provided that the two graphs contain common set of typed instances. The supervised results are obtained from a hierarchy of Support Vector Machines classifiers (hSVM) trained on the bag-of-words representation of short abstracts and categories of Wikipedia articles. The results of both approaches are probabilistically fused. For evaluation we created a gold-standard dataset covering over 2000 DBpedia entities using a commercial crowdsourcing service. The hierarchical precision of our hSVM and STI approaches is comparable to SDType, the current state-of-the-art type inference algorithm, while the set of applicable instances is largely complementary to SDType as our algorithms do not require semantic properties in the knowledge graph to type an instance. The paper also provides a comprehensive evaluation of type assignment in DBpedia in terms of hierarchical precision, recall and exact match with the gold standard. Dataset generated by a version of the presented approach is included in DBpedia 2015.

Keywords: Type inference, Support Vector Machines, Entity classification, DBpedia

Suggested Citation

Kliegr, Tomas and Zamazal, Ondřej, LHD 2.0: A Text Mining Approach to Typing Entities in Knowledge Graphs (2016). Available at SSRN: https://ssrn.com/abstract=3199238 or http://dx.doi.org/10.2139/ssrn.3199238

Tomas Kliegr (Contact Author)

University of Economics, Prague - Department of Information and Knowledge Engineering ( email )

Nam. W. Churchilla 4
Praha 3
Czech Republic

Queen Mary University of London - Multimedia and Vision Research Group (MMV Group) ( email )

Mile End Road, Mile End
London, England E1 4NS
United Kingdom

Ondřej Zamazal

University of Economics, Prague - Department of Information and Knowledge Engineering ( email )

Nam. W. Churchilla 4
Praha 3
Czech Republic

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Downloads
58
Abstract Views
499
PlumX Metrics