On the Size of Intermediate Results in the Federated Processing of SPARQL BGPs
23 Pages Posted: 12 Sep 2018 First Look: Accepted
This paper is a foundational study in the semantics of federated query answering of SPARQL BGPs. Its specific concern is to explore how the size of intermediate results can be reduced without, from a logical point of view, altering the content of the final answer. The intended application is to reduce communication costs and local memory consumption in querying dynamic network topologies and highly distributed, share-nothing or sharded architectures. We define row-reducing and column-reducing operations that, if a SPARQL result set is viewed as a table, reduces the number of rows and columns respectively. These operations are deliberately designed so that they do not anticipate the unfolding of the evaluation process, which is to say that they do not presuppose knowledge about the structure or content of data sources, or equivalently, that they do not require data to be exchange in order to make intermediate results smaller. In other words, the operations that are studied are based solely on the shape of evaluations trees and the distribution of variables within them. The paper culminates with a study of different compositions of the aforementioned reduction operators. We establish mathematically that our row- and column operators can be combined to form a single reduction operator that can be applied repeatedly without altering the semantics of the final result of the query answering process.
Keywords: Federated query processing, intermediate results, minimization, blank nodes, sparql
Suggested Citation: Suggested Citation