Querying Web-Sources within a Data Federation

MIT Sloan Research Paper No. 4624-06

CISL Working Paper No. 2006-09

19 Pages Posted: 28 Aug 2006

See all articles by Lynn Wu

Lynn Wu

University of Pennsylvania - The Wharton School

Aykut Firat

Massachusetts Institute of Technology (MIT) - Sloan School of Management

Tarik Alatovic

Massachusetts Institute of Technology (MIT)

Stuart Madnick

Massachusetts Institute of Technology (MIT) - Sloan School of Management

Date Written: August 2006

Abstract

The web is undoubtedly the largest and most diverse repository of data, but it was not designed to offer the capabilities of traditional data base management systems - which is unfortunate. In a true data federation, all types of data sources, such as relational databases and semi-structured websites, could be used together. IBM WebSphere uses the "request-reply-compensate" protocol to communicate with wrappers in a data federation. This protocol expects wrappers to reply to query requests by indicating the portion of the queries they can answer. While this provides a very generic approach to data federation, it also requires the wrapper developer to deal with some of the complexities of capability considerations through custom coding. Alternative approaches based on declarative capability restrictions have been proposed in the literature, but they have not found their way into commercial systems, perhaps due to their complexity. We offer a practical middle-ground solution to querying web-sources, using IBM's data federation system as an example. In lieu of a two-layered architecture consisting of wrapper and source layers, we propose to move the capability declaration from the wrapper layer to a single component between the wrapper and the native data source. The advantage of this three-layered architecture is that each new web-source only needs to register its capability with the capability-declaration component once, which saves the work of writing a new wrapper each time. Thus the inclusion of web-sources through this mechanism can be accelerated in a way that doesn't require a change in existing data federation technology.

Keywords: federated data, web data sources, capabilities, query handling

Suggested Citation

Wu, Lynn and Firat, Aykut and Alatovic, Tarik and Madnick, Stuart E., Querying Web-Sources within a Data Federation (August 2006). MIT Sloan Research Paper No. 4624-06; CISL Working Paper No. 2006-09. Available at SSRN: https://ssrn.com/abstract=926628 or http://dx.doi.org/10.2139/ssrn.926628

Lynn Wu

University of Pennsylvania - The Wharton School ( email )

3733 Spruce Street
Philadelphia, PA 19104-6374
United States

Aykut Firat

Massachusetts Institute of Technology (MIT) - Sloan School of Management ( email )

100 Main Street
E62-416
Cambridge, MA 02142
United States

Tarik Alatovic

Massachusetts Institute of Technology (MIT) ( email )

77 Massachusetts Avenue
50 Memorial Drive
Cambridge, MA 02139-4307
United States

Stuart E. Madnick (Contact Author)

Massachusetts Institute of Technology (MIT) - Sloan School of Management ( email )

E53-321
Cambridge, MA 02142
United States
617-253-6671 (Phone)
617-253-3321 (Fax)

Register to save articles to
your library

Register

Paper statistics

Downloads
92
Abstract Views
1,297
rank
278,300
PlumX Metrics