Robust and Scalable Linked Data Reasoning Incorporating Provenance and Trust Annotations
64 Pages Posted: 9 Jul 2018 First Look: Accepted
In this paper, we leverage annotated logic programs for tracking indicators of provenance and trust during reasoning, specifically focussing on the use-case of applying a scalable subset of OWL 2 RL/RDF rules over static corpora of arbitrary Linked Data (Web data). Our annotations encode three facets of information: (i) blacklist: a (possibly manually generated) boolean annotation which indicates that the referent data are known to be harmful and should be ignored during reasoning; (ii) ranking: a numeric value derived by a PageRank-inspired technique — adapted for Linked Data — which determines the centrality of certain data artefacts (such as RDF documents and statements); (iii) authority: a boolean value which uses Linked Data principles to conservatively determine whether or not some terminological information can be trusted. We formalise a logical framework which annotates inferences with the strength of derivation along these dimensions of trust and provenance; we formally demonstrate some desirable properties of the deployment of annotated logic programming in our setting, which guarantees (i) a unique minimal model (least fixpoint); (ii) monotonicity; (iii) finitariness; and (iv) finally decidability. In so doing, we also give some formal results which reveal strategies for scalable and efficient implementation of various reasoning tasks one might consider. Thereafter, we discuss scalable and distributed implementation strategies for applying our ranking and reasoning methods over a cluster of commodity hardware; throughout, we provide evaluation of our methods over 1 billion Linked Data quadruples crawled from approximately 4 million individual Web documents, empirically demonstrating the scalability of our approach, and how our annotation values help ensure a more robust form of reasoning. We finally sketch, discuss and evaluate a use-case for a simple repair of inconsistencies detectable within OWL 2 RL/RDF constraint rules using ranking annotations to detect and defeat the “marginal view”, and in so doing, infer an empirical “consistency threshold” for the Web of Data in our setting.
Keywords: annotated programs, linked data, web reasoning, scalable reasoning, distributed reasoning, authoritative reasoning, OWL 2RL, provenance, pagerank, inconsistency, repair
Suggested Citation: Suggested Citation