Using Structural Analysis to Mediate XML Semantic Interoperability
Indian Institute of Management (IIMB), Bangalore
Massachusetts Institute of Technology (MIT)
MIT Sloan Working Paper No. 4345-02; Eller College Working Paper No. 1024-05
At the forefront of interoperability using XML in an Internet environment is the issue of semantic trans-lation; that is, the ability to properly interpret the elements, attributes, and values contained in an XML file. In many cases, specific domains have standardized the way data are represented in XML. When this does not occur, some type of mediation is required to interpret XML formatted data that does not adhere to pre-defined semantics. The prototype X-Map was developed to investigate what is required to mediate semantic interoperability between heterogeneous domains. An essential component of this system is structural analysis of data representations in the respective domains. When mediating XML data between similar but non-identical domains, we cannot rely solely on semantic similarities of tags and/or the data content of elements to establish associations between related elements, especially over the Internet. To complement these discovered associations one can attempt to build on relationships based on the respective domain structures and the position and relationships of evaluated elements within those structures. For this purpose, the domains are represented as hierarchical trees in XML syntax; a more general solution handles arbitrary graphs. A structural analysis algorithm builds on associations discovered by other analysis, using these associations to aid in discovering further links that could not have been discovered by purely static examination of the elements and their aggregate content. A number of methodologies are presented by which the algorithm maximizes the number of relevant mappings or associations derived from the XML structures. The paper concludes with comparative results obtained using these three methodologies.
Number of Pages in PDF File: 24
Date posted: April 11, 2002