Querying Web-Sources within a Data Federation
CISL Working Paper No. 2006-09
19 Pages Posted: 28 Aug 2006
Date Written: August 2006
The web is undoubtedly the largest and most diverse repository of data, but it was not designed to offer the capabilities of traditional data base management systems - which is unfortunate. In a true data federation, all types of data sources, such as relational databases and semi-structured websites, could be used together. IBM WebSphere uses the "request-reply-compensate" protocol to communicate with wrappers in a data federation. This protocol expects wrappers to reply to query requests by indicating the portion of the queries they can answer. While this provides a very generic approach to data federation, it also requires the wrapper developer to deal with some of the complexities of capability considerations through custom coding. Alternative approaches based on declarative capability restrictions have been proposed in the literature, but they have not found their way into commercial systems, perhaps due to their complexity. We offer a practical middle-ground solution to querying web-sources, using IBM's data federation system as an example. In lieu of a two-layered architecture consisting of wrapper and source layers, we propose to move the capability declaration from the wrapper layer to a single component between the wrapper and the native data source. The advantage of this three-layered architecture is that each new web-source only needs to register its capability with the capability-declaration component once, which saves the work of writing a new wrapper each time. Thus the inclusion of web-sources through this mechanism can be accelerated in a way that doesn't require a change in existing data federation technology.
Keywords: federated data, web data sources, capabilities, query handling
Suggested Citation: Suggested Citation