Relaxing Relationship Queries on Graph Data
16 Pages Posted: 29 Sep 2020 Publication Status: Accepted
In many domains we have witnessed the need to search a large entity-relation graph for direct and indirect relationships between a set of entities specified in a query. A search result, called a semantic association (SA), is typically a compact (e.g., diameter-constrained) connected subgraph containing all the query entities. For this problem of SA search, effcient algorithms exist but will return empty results if some query entities are distant in the graph. To reduce the occurrence of failing query and provide alternative results, we study the problem of query relaxation in the context of SA search. Simply relaxing the compactness constraint will sacrifice the compactness of an SA, and more importantly, may lead to performance issues and be impracticable. Instead, we focus on removing the smallest number of entities from the original failing query, to form a maximum successful sub-query which minimizes the loss of result quality caused by relaxation. We prove that verifying the success of a sub-query turns into finding an entity (called a certificate) that satisfies a distance-based condition about the query entities. To efficiently find a certificate of the success of a maximum sub-query, we propose a best-first search algorithm that leverages distance-based estimation to effectively prune the search space. We further improve its performance by adding two fine-grained heuristics: one based on degree and the other based on distance. Extensive experiments over popular RDF datasets demonstrate the effciency of our algorithm, which is more scalable than baselines.
Keywords: semantic association search, complex relationship, query relaxation, graph data
Suggested Citation: Suggested Citation