header

The Role of Knowledge in Determining Identity of Long-Tail Entities

23 Pages Posted: 29 Sep 2020 Publication Status: Accepted

See all articles by Filip Ilievski

Filip Ilievski

University of Southern California - Information Sciences Institute

Eduard Hovy

Language Technologies Institute, Carnegie Mellon University

Piek Vossen

VU University Amsterdam

Stefan Schlobach

VU University Amsterdam

Qizhe Xie

Language Technologies Institute, Carnegie Mellon University

Abstract

Identifying entities in text is an important step of semantic analysis. Some entity mentions comprise a name or description, but many include no information that identifies them in the system’s knowledge resources, which means that their identity cannot be established through traditional disambiguation. Consequently, such NIL (not in lexicon) entities have received little attention in entity linking systems and tasks so far. However, given the non-redundancy of knowledge on NIL entities, their lack of frequency priors, their potentially extreme ambiguity, and their numerousness, they constitute an important class of long-tail entities and pose a great challenge for state-of-the-art systems. In this paper, we describe a method for imputing identifying knowledge to NILs from generalized characteristics. We enrich the locally extracted information with profile models that rely on background knowledge in Wikidata. We describe and implement two profiling machines using state-of-the-art neural models. We evaluate their intrinsic behavior and their impact on the task of determining the identity of NIL entities.

Keywords: Long-tail entities, NIL clustering, knowledge-based completion

Suggested Citation

Ilievski, Filip and Hovy, Eduard and Vossen, Piek and Schlobach, Stefan and Xie, Qizhe, The Role of Knowledge in Determining Identity of Long-Tail Entities (September 22, 2020). Available at SSRN: https://ssrn.com/abstract=3697491 or http://dx.doi.org/10.2139/ssrn.3697491

Filip Ilievski (Contact Author)

University of Southern California - Information Sciences Institute ( email )

4676 Admiralty Way
Suite 1001
Marina del Rey, CA 90292
United States

Eduard Hovy

Language Technologies Institute, Carnegie Mellon University ( email )

Piek Vossen

VU University Amsterdam

De Boelelaan 1105
Amsterdam, ND North Holland 1081 HV
Netherlands

Stefan Schlobach

VU University Amsterdam ( email )

De Boelelaan 1105
Amsterdam, ND North Holland 1081 HV
Netherlands

Qizhe Xie

Language Technologies Institute, Carnegie Mellon University

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Abstract Views
333
Downloads
48
PlumX Metrics