header

Large-Scale Relation Extraction from Web Documents and Knowledge Graphs with Human-in-the-Loop

10 Pages Posted: 13 Dec 2019 First Look: Under Review

See all articles by Petar Ristoski

Petar Ristoski

IBM Research - Almaden Research Center

Anna Lisa Gentile

IBM Research - Almaden Research Center

Alfredo Alba

IBM Research - Almaden Research Center

Daniel Gruhl

IBM Research - Almaden Research Center

Steven Welch

IBM Research - Almaden Research Center

Abstract

The Semantic Web movement has produced a wealth of curated collections of entities and facts, often referred as Knowledge Graphs. Creating and maintaining such knowledge Graphs is far from being a solved problem: it is crucial to constantly extract new information from the vast amount of heterogeneous sources of data on the Web. In this work we address the task of Knowledge Graph population. Specifically, given any target relation between two entities, we propose an approach to extract positive instances of the relation from various Web sources. Our relation extraction approach introduces a human-in-the-loop component in the extraction pipeline, which delivers significant advantage with respect to other solely automatic approaches. We test our solution on the ISWC 2018 Semantic Web Challenge, with the objective to identify supply-chain relations among organizations in the Thomson Reuters Knowledge Graph. Our human-in-the-loop extraction pipeline achieves top performance among all competing systems.

Keywords: Relation Extraction, Web Mining, Knowledge Graph Mining, Human-in-the-Loop

Suggested Citation

Ristoski, Petar and Gentile, Anna Lisa and Alba, Alfredo and Gruhl, Daniel and Welch, Steven, Large-Scale Relation Extraction from Web Documents and Knowledge Graphs with Human-in-the-Loop (December 11, 2019). Available at SSRN: https://ssrn.com/abstract=3502435 or http://dx.doi.org/10.2139/ssrn.3502435

Petar Ristoski (Contact Author)

IBM Research - Almaden Research Center ( email )

San Jose, CA 95120
United States

Anna Lisa Gentile

IBM Research - Almaden Research Center ( email )

San Jose, CA 95120
United States

Alfredo Alba

IBM Research - Almaden Research Center

San Jose, CA 95120
United States

Daniel Gruhl

IBM Research - Almaden Research Center

San Jose, CA 95120
United States

Steven Welch

IBM Research - Almaden Research Center

San Jose, CA 95120
United States

Here is the Coronavirus
related research on SSRN

Paper statistics

Abstract Views
238
Downloads
29