| . |
Hongwei (Harry) Zhu's
Scholarly Papers
Click on the title of any column to sort the table by that
column. |
|
|
| |
|
|
Aggregate Statistics |
|
Total Downloads
2,693 |
Total
Citations
33 |
|
|
|
|
|
1.
|
|
|
Stuart E. Madnick Massachusetts Institute of Technology (MIT) - Sloan School of Management Nazli Choucri Massachusetts Institute of Technology (MIT) Michael Siegel Massachusetts Institute of Technology (MIT) - Sloan School of Management Farnaz Haghseta Massachusetts Institute of Technology (MIT) Allen Moulton Massachusetts Institute of Technology (MIT) Hongwei (Harry) Zhu Massachusetts Institute of Technology (MIT)
|
| Posted: |
|
28 Apr 02
|
|
Last Revised:
|
|
07 May 02
|
|
416 (18,273)
|
|
|
| |
Abstract:
The convergence of three distinct but interconnected trends - unrelenting globalization, growing worldwide electronic connectivity, and increasing knowledge intensity of economic activity - is creating powerful new opportunities and challenges for global politics. This rapidly changing environment has information demands that surpass existing capabilities for information access, interpretation, and overall use, thus hindering our abilities to address emergent and complex global challenges, such as terrorism and other security threats. This reality has serious implications for two diverse domains of scholarship: international relations (IR) in political science and information technology (IT). Unless IT advances remain "one step ahead" of emergent realities and complexities, strategies for better understanding and responding to critical global challenges will be severely impeded. For example, more so now than ever, the U.S. Office of Counter-Terrorism and the newly-created Office of Homeland Security rely on intelligence information from all over the world to develop strategic responses to security threats. However, relevant information is stored in various regions throughout the world and by diverse agencies in different media, formats, and contexts. Intelligent integration of information is fundamental to developing policies to anticipate and strengthen protection against terrorist threats or attacks in the United States. This Project's activities, and relationships with its collaborators, will be coordinated through a newly formed joint Laboratory for Information Globalization and Harmonization Technologies (LIGHT). LIGHT will address information needs in the IR domain, focusing on the conflict realm, which deals with emergent risks, threats, and uncertainties of potentially global scale and scope related to: (a) crises, (b) conflicts and war; and (c) anticipation, monitoring and early warning. The goals of this initiative are to: (1) improve understanding of the types of IR information needs for decision making and institutional performance under varying degrees of risk and uncertainty; (2) design and implement the System for Harmonized Information Processing, to facilitate access to and correct interpretation of essential information that is critical to policy and research in the IR realm, as well as to other similarly complex domains, and (3) advance developments in the use of information technologies to facilitate such interdisciplinary research and to contribute to new education approaches, tools, and methods. Increasingly, addressing problems central to national and global interests in complex domains such as IR requires the use of technologies that easily combine observations from disparate sources, using different interpretations, for different purposes, and by a wide range of users. Critical advances in IT capabilities must span multiple domains (e.g., economic, political, geographic, commercial, and demographic), diverse contexts (i.e., meanings, languages, assumptions), and a multiplicity of contending agents (i.e., states, governments, corporations, international institutions). The technology-related research will focus on acquiring and enhancing information to serve user requirements both over individual domains (i.e., a single shared ontology) and across multiple domains, which are necessary for addressing complex challenges. The core innovation is reflected in the notion of a Collaborative Domain Space (CDS), within which applications in a common domain can share, analyze, modify, and develop information. For applications that span multiple domains we provide for a Collection of CDSs to link shared concepts in distinct domains. Moreover, we will develop the System for Harmonized Information Processing that incorporates CDSs as a basis for knowledge representation and includes all the necessary reasoning algorithms required to support information processing over a range of heterogeneous sources and applications. The development of the system described above builds upon prior work. The political science IR work will draw on an earlier Internet-based experimental "platform" for exploring forms of information generation, provision, and integration across multiple domains, regions, languages, and epistemologies which are relevant to complex but domain-specific applications, the Global System for Sustainable Development (GSSD). The IT component builds on work on the Context Interchange project (COIN) focused on the integration of a range of distributed heterogeneous information sources (e.g., financial, supply chain, disaster relief) using ontologies, databases, context mediation algorithms, and wrapper technologies. Both groups have considerable experience with the organization and management of large scale, international, distributed, and diverse research projects, including cross-national (e.g., China, Middle East, Europe) and institutional (private, public, national and international) agencies. The anticipated results will apply to any complex domain with multiple entities that rely on heterogeneous distributed data to address and resolve compelling problems. This initiative is supported by a network of international collaborators from (a) scientific and research institutions, (b) business and industry, and (c) national and international agencies. Expected research products include: a software platform, IR-based knowledge repository, and diverse applications in policy, research, and education which are anticipated to significantly impact the way complex organizations, and society in general, understand and manage critical global challenges.
|
|
|
2.
|
|
|
Vincent Maugis Massachusetts Institute of Technology (MIT) - Department of Political Science Nazli Choucri Massachusetts Institute of Technology (MIT) Stuart E. Madnick Massachusetts Institute of Technology (MIT) - Sloan School of Management Michael Siegel Massachusetts Institute of Technology (MIT) - Sloan School of Management Sharon E. Gillett Massachusetts Institute of Technology (MIT) Farnaz Haghseta Massachusetts Institute of Technology (MIT) Hongwei (Harry) Zhu Massachusetts Institute of Technology (MIT) Mike NMI Best Massachusetts Institute of Technology (MIT) - Center for Technology, Policy, and Industrial Development (CTPID)
|
| Posted: |
|
26 Apr 04
|
|
Last Revised:
|
|
10 Apr 05
|
|
340 (23,584)
|
|
|
| |
Abstract:
With the rapid diffusion of the Internet worldwide, there has been considerable interest in the e-potentials of developing countries giving rise to a 1st generation of e-Readiness studies. Moreover, e-Readiness means different things to different people, in different contexts, and for different purposes. Despite strong merits, this first generation of e-Readiness studies assumed a fixed, one-size-fits-all set of requirements, regardless of the characteristics of individual countries, the investment context, or the demands of specific applications. This feature obscures critical information for investors or policy analysts seeking to reduce uncertainties and/or make more educated decisions. But there is very little known about e-Readiness for e-Banking. In particular, based on lessons learnt to date and their implications for emerging realities of the 21st century, we designed and executed a research project with theoretical as well as practical dimensions to answer the question of e-Readiness for What, focusing specifically on e-Banking, based on the very assumption that one size can seldom, if ever, fit all. We propose and develop a conceptual framework for the "next generation" ereadiness - focusing on different e-Business applications in different economic contexts with potentially different pathways - as well as a data model - to explore e-Readiness for e-Banking in ten countries.
e-readiness assessment, value-creation opportunities, e-Banking, banking, pathways, profiles, leapfrogging
|
|
|
3.
|
|
|
Hongwei (Harry) Zhu Massachusetts Institute of Technology (MIT) Stuart E. Madnick Massachusetts Institute of Technology (MIT) - Sloan School of Management
|
| Posted: |
|
19 Dec 07
|
|
Last Revised:
|
|
23 Apr 08
|
|
227 (37,394)
|
|
|
| |
Abstract:
As an open standard for electronic communication of business and financial data, XBRL has the potential of improving the efficiency of the business data supply chain. A number of jurisdictions have developed different XBRL taxonomies as their data standards. Semantic heterogeneity exists in these taxonomies, the corresponding instances, and the internal systems that store the original data. Consequently, there are still substantial difficulties in creating and using XBRL instances that involve multiple taxonomies. To fully realize the potential benefits of XBRL, we have to develop technologies to reconcile semantic heterogeneity and enable interoperability of various parts of the supply chain. In this paper, we analyze the XBRL standard and use examples of different taxonomies to illustrate the interoperability challenge. We also propose a technical solution that incorporates schema matching and context mediation techniques to improve the efficiency of the production and consumption of XBRL data.
XBRL, semantic data integration, context mediation, ontology, schema matching
|
|
|
4.
|
|
|
Nazli Choucri Massachusetts Institute of Technology (MIT) Stuart E. Madnick Massachusetts Institute of Technology (MIT) - Sloan School of Management Allen Moulton Massachusetts Institute of Technology (MIT) Michael Siegel Massachusetts Institute of Technology (MIT) - Sloan School of Management Hongwei (Harry) Zhu Massachusetts Institute of Technology (MIT)
|
| Posted: |
|
05 Jan 05
|
|
Last Revised:
|
|
09 Feb 05
|
|
216 (39,375)
|
|
|
| |
Abstract:
In its Preface, The 9/11 Commission Report states: We learned that the institutions charted with protecting ... national security did not understand how grave this threat can be, and did not adjust their policies, plans, and practices to deter or defeat it (2004: xvi). Given current realities and uncertainties better preparedness can be achieved by identifying, controlling and managing the elusive linkages & situational factors that fuel hostilities. This paper focuses on new opportunities and capabilities provided by anticipatory technologies that help understand, measure and model the complex dynamics shaping and precipitating conflict in specific settings worldwide. We introduce a research initiative focusing on linking pre- and post- conflict by drawing upon the power of system dynamics, augmented by new technologies for integrated information analysis, in conjunction with the development of conceptual and computational ontologies capturing the diversity, intensity, and dynamics of the conflict domain.
national security, system dynamics, integrated information analysis, conceptual and computational ontologies
|
|
|
5.
|
|
|
Stuart E. Madnick Massachusetts Institute of Technology (MIT) - Sloan School of Management Hongwei (Harry) Zhu Massachusetts Institute of Technology (MIT)
|
| Posted: |
|
20 Oct 05
|
|
Last Revised:
|
|
01 Feb 06
|
|
155 (54,708)
|
3
|
|
| |
Abstract:
Data quality issues have taken on increasing importance in recent years. In our research, we have discovered that many data quality problems are actually data misinterpretation problems - that is, problems caused by heterogeneous data semantics. In this paper, we first identify semantic heterogeneities that, when not resolved, often cause data quality problems. We discuss the especially challenging problem of aggregational ontological heterogeneity, which concerns how complex entities and their relationships are aggregated. Then we illustrate how COntext INterchange (COIN) technology can be used to capture data semantics and reconcile semantic heterogeneities, thereby improving data quality.
Data Quality, Data Semantics, Semantic Heterogeneity, Ontology, Context
|
|
|
6.
|
|
|
Hongwei (Harry) Zhu Massachusetts Institute of Technology (MIT) Stuart E. Madnick Massachusetts Institute of Technology (MIT) - Sloan School of Management Michael Siegel Massachusetts Institute of Technology (MIT) - Sloan School of Management
|
| Posted: |
|
06 Feb 03
|
|
Last Revised:
|
|
06 Jan 06
|
|
145 (58,265)
|
7
|
|
| |
Abstract:
Web aggregation has been available regionally for several years, but this service has not been offered globally. As an example, using multiple regional comparison aggregators, we analyze the global prices for a Sony camcorder, which differ by more than three times. We further explain that lack of global comparison aggregation services partially contribute to such huge price dispersion. We also discuss difficulties encountered in the manual integration of global web sources. Motivated by this example, we propose a context mediation architecture for global aggregation to address semantic disparities of global information sources. Global aggregation services can bring efficiency to the global market and can be useful for market research and other business uses.
Web Aggregation, Context, Semantic Integration
|
|
|
7.
|
|
|
Hongwei (Harry) Zhu Massachusetts Institute of Technology (MIT) Stuart E. Madnick Massachusetts Institute of Technology (MIT) - Sloan School of Management Michael Siegel Massachusetts Institute of Technology (MIT) - Sloan School of Management
|
| Posted: |
|
06 Jan 03
|
|
Last Revised:
|
|
07 Jan 06
|
|
135 (62,014)
|
1
|
|
| |
Abstract:
The development of web technology has led to the emergence of web aggregation, a service that collects existing web data and turns them into more useful information. We review the development of both comparison and relationship aggregation and discuss their impacts on various stakeholders. The aggregator's capability of transparently extracting web data has raised challenging issues in database and privacy protection. Consequently, new regulations are introduced or being proposed. We analyze the interactions between aggregation and related policies and provide our insights about the implications of new policies on the development of web aggregation.
International IP Law, Privacy Law, Web Aggregation
|
|
|
8.
|
|
|
Thomas Gannon MITRE Corporation Stuart E. Madnick Massachusetts Institute of Technology (MIT) - Sloan School of Management Allen Moulton Massachusetts Institute of Technology (MIT) Michael Siegel Massachusetts Institute of Technology (MIT) - Sloan School of Management Marwan Sabbouh MITRE Corporation Hongwei (Harry) Zhu Massachusetts Institute of Technology (MIT)
|
| Posted: |
|
12 May 05
|
|
Last Revised:
|
|
02 Sep 05
|
|
126 (65,739)
|
|
|
| |
Abstract:
There is pressing need for effectively integrating information from an ever increasing number of available sources both on the web and in other existing systems. A key difficulty of achieving this goal comes from the pervasive heterogeneities in all levels of information systems. Existing and emerging technologies, such as the Web, ODBC, XML, and Web Services, provide essential capabilities in resolving heterogeneities in the hardware and software platforms, but they do not address the semantic heterogeneity of the data itself. A robust solution to this problem needs to be adaptable, extensible, and scalable. In this paper, we identify the deficiencies of traditional approaches that address this problem using hand-coded programs or require complete data standardization. The COntext INterchange (COIN) approach overcomes these deficiencies by declaratively representing data semantics and using a mediator to create the necessary conversion programs using a small number of conversion rules. The capabilities of COIN is demonstrated using an intelligence information integration example consisting of 150 data sources, where COIN can automatically generate the over 22,000 conversion programs needed to enable semantic integration using only six parametizable conversion rules. This paper makes a unique contribution by providing a systematic evaluation of COIN and other commonly practiced approaches.
semantic integration, adaptability, extensibility, scalability, context
|
|
|
9.
|
|
|
Hongwei (Harry) Zhu Massachusetts Institute of Technology (MIT) Stuart E. Madnick Massachusetts Institute of Technology (MIT) - Sloan School of Management
|
| Posted: |
|
28 Aug 06
|
|
Last Revised:
|
|
29 Aug 06
|
|
116 (70,335)
|
1
|
|
| |
Abstract:
The availability of data on the web and the improvement of technologies have made it increasingly easy to reuse existing data to create new databases and provide valueadded services. Meanwhile, initial database creators have been seeking legal protection for their data. After presenting a brief history of legislation related to legal protection for non-copyrightable database contents, we discuss challenging issues to be considered in formulating a database protection regulation. These issues can be addressed from the perspective of economics. Results from a preliminary economic analysis are presented. The findings indicate that depending on investment required to create the initial database and the level of differentiation between the initial database and the reuser database, the choice of a social welfare-enhancing regulation can allow for no reuse, free reuse, or fee-paying reuse.
database protection, data reuse, economic analysis
|
|
|
10.
|
|
|
Nazli Choucri Massachusetts Institute of Technology (MIT) Stuart E. Madnick Massachusetts Institute of Technology (MIT) - Sloan School of Management Allen Moulton Massachusetts Institute of Technology (MIT) Michael Siegel Massachusetts Institute of Technology (MIT) - Sloan School of Management Hongwei (Harry) Zhu Massachusetts Institute of Technology (MIT)
|
| Posted: |
|
13 Apr 04
|
|
Last Revised:
|
|
29 Dec 04
|
|
102 (77,721)
|
1
|
|
| |
Abstract:
The National Research Council has noted that although there are many private and public databases that contain information potentially relevant to counterterrorism programs, they lack the necessary context definitions (i.e., metadata) and access tools to enable interoperation with other databases and the extraction of meaningful and timely information. In this paper we present examples of these problems and a technology developed at MIT, called context mediation, which provides a novel approach for addressing these problems.
context mediation, heterogeneous contexts
|
|
|
11.
|
|
|
Hongwei (Harry) Zhu Massachusetts Institute of Technology (MIT) Stuart E. Madnick Massachusetts Institute of Technology (MIT) - Sloan School of Management Michael Siegel Massachusetts Institute of Technology (MIT) - Sloan School of Management
|
| Posted: |
|
18 Jan 06
|
|
Last Revised:
|
|
04 May 06
|
|
96 (81,128)
|
1
|
|
| |
Abstract:
With the increasing use of the Internet, many of us feel strongly about the free and unfettered exchange and use of information. But the actual situation is not that simple. After the European Union adopted the Database Directive to provide legal protection for non-copyrightable database contents, the U.S. has introduced six legislative proposals, all of which failed to become a law. One of the major difficulties of formulating a socially beneficial database law is in finding the right balance between protecting the incentives of creating publicly accessible databases (including semi-structured web sites) and preserving adequate access to factual data for value creating activities. We address the problem by developing an extended spatial competition model that explicitly considers the inefficiencies in policy administration. With the model, we can determine various conditions and the corresponding socially beneficial policy choices. The results show that, depending on the cost level of database creation, the degree of differentiation of the reuser database, and the efficiency of policy administration, the socially beneficial policy choice can be protecting a legal monopoly, encouraging competition via compulsory licensing, discouraging voluntary licensing, or even allowing free riding. The results provide useful insights to the formulation of a socially beneficial database protection policy.
database protection, data reuse, policy, intellectual property
|
|
|
12.
|
|
|
Hongwei (Harry) Zhu Massachusetts Institute of Technology (MIT) Stuart E. Madnick Massachusetts Institute of Technology (MIT) - Sloan School of Management Michael Siegel Massachusetts Institute of Technology (MIT) - Sloan School of Management
|
| Posted: |
|
13 Apr 04
|
|
Last Revised:
|
|
07 Jan 06
|
|
95 (81,765)
|
2
|
|
| |
Abstract:
The change in meaning of data over time poses significant challenges for the use of that data. These challenges exist in the use of an individual data source and are further compounded with the integration of multiple sources. In this paper, we identify three types of temporal semantic heterogeneities, which have not been addressed by existing research. We propose a solution that is based on extensions to the Context Interchange framework. This approach provides mechanisms for capturing semantics using ontology and temporal context. It also provides a mediation service that automatically resolves semantic conflicts. We show the feasibility of this approach by demonstrating a prototype that implements a subset of the proposed extensions.
Context Interchange framework, ontology and temporal context
|
|
|
13.
|
|
|
Hongwei (Harry) Zhu Massachusetts Institute of Technology (MIT) Stuart E. Madnick Massachusetts Institute of Technology (MIT) - Sloan School of Management Michael Siegel Massachusetts Institute of Technology (MIT) - Sloan School of Management
|
| Posted: |
|
19 Dec 07
|
|
Last Revised:
|
|
01 Jun 08
|
|
94 (82,390)
|
|
|
| |
Abstract:
Sell Globally and Shop Globally have been seen as a potential benefit of web-enabled electronic business. One important step toward realizing this benefit is to know how things are selling in various parts of the world. A global price comparison service would address this need. But there have not been many such services. In this paper, we use a case study of global price dispersion to illustrate the need and the value of a global price comparison service. Then we identify and discuss several technology challenges including semantic heterogeneity, in providing a global price comparison service. We propose a mediation architecture to address the semantic heterogeneity problem, and demonstrate the feasibility of the proposed architecture by implementing a prototype that enables global price comparison using data from web sources in several countries.
Global Price Comparison, Shopbots, Context, Semantic Data
|
|
|
14.
|
|
|
Hongwei (Harry) Zhu Massachusetts Institute of Technology (MIT) Stuart E. Madnick Massachusetts Institute of Technology (MIT) - Sloan School of Management
|
| Posted: |
|
29 Oct 04
|
|
Last Revised:
|
|
29 Oct 04
|
|
77 (94,089)
|
9
|
|
| |
Abstract:
Many online services access a large number of autonomous data sources and at the same time need to meet different user requirements. It is essential for these services to achieve semantic interoperability among these information exchange entities. In the presence of an increasing number of proprietary business processes, heterogeneous data standards, and diverse user requirements, it is critical that the services are implemented using adaptable, extensible, and scalable technology. The Context Interchange (COIN) approach, inspired by similar goals of the Semantic Web, provides a robust solution. In this paper, we describe how COIN can be used to implement dynamic online services where semantic differences are reconciled on the fly. We show that COIN is flexible and scalable by comparing it with several conventional approaches. With a given ontology, the number of conversions in COIN is quadratic to the semantic aspect that has the largest number of distinctions. These semantic aspects are modeled as modifiers in a conceptual ontology; in most cases the number of conversions is linear with the number of modifiers, which is significantly smaller than traditional hard-wiring middleware approach where the number of conversion programs is quadratic to the number of sources and data receivers. In the example scenario in the paper, the COIN approach needs only 5 conversions to be defined while traditional approaches require 20,000 to 100 million. COIN achieves this scalability by automatically composing all the comprehensive conversions from a small number of declaratively defined sub-conversions.
ontology, semantics, scalability, data integration, heterogeneous sources
|
|
|
15.
|
|
|
Hongwei (Harry) Zhu Massachusetts Institute of Technology (MIT) Stuart E. Madnick Massachusetts Institute of Technology (MIT) - Sloan School of Management
|
| Posted: |
|
28 Aug 06
|
|
Last Revised:
|
|
18 Oct 06
|
|
68 (101,554)
|
2
|
|
| |
Abstract:
There are many different kinds of ontologies used for different purposes in modern computing. Lightweight ontologies are easy to create, but difficult to deploy; formal ontolgies are relatively easy to deploy, but difficult to create. This paper presents an approach that combines the strengths and avoids the weaknesses of lightweight and formal ontologies. In this approach, the ontology includes only high level concepts; subtle differences in the interpretation of the concepts are captured as context descriptions outside the ontology. The resulting ontology is simple, thus it is easy to create. The context descriptions facilitate data conversion composition, which leads to a scalable solution to semantic interoperability among disparate data sources and contexts.
lightweight ontology, context, mediation, scalability
|
|
|
16.
|
|
|
Hongwei (Harry) Zhu Massachusetts Institute of Technology (MIT) Stuart E. Madnick Massachusetts Institute of Technology (MIT) - Sloan School of Management Michael Siegel Massachusetts Institute of Technology (MIT) - Sloan School of Management
|
| Posted: |
|
17 Aug 04
|
|
Last Revised:
|
|
16 Aug 05
|
|
65 (104,212)
|
2
|
|
| |
Abstract:
Changes of semantics in data sources further complicate the semantic heterogeneity problem. We identify four types of semantic heterogeneities related to changing semantics and present a solution based on an extension to the Context Interchange (COIN) framework. Changing semantics is represented as multi-valued contextual attributes in a shared ontology; however, only a single value is valid over a certain time interval. A mediator, implemented in abductive constraint logic programming, processes the semantics by solving temporal constraints for single-valued time intervals and automatically applying conversions to resolve semantic differences over these intervals. We also discuss the scalability of the approach and its applicability to the Semantic Web.
semantic heterogeneity problem, Context Interchange (COIN) framework
|
|
|
17.
|
|
|
Hongwei (Harry) Zhu Massachusetts Institute of Technology (MIT) Stuart E. Madnick Massachusetts Institute of Technology (MIT) - Sloan School of Management
|
| Posted: |
|
28 Aug 06
|
|
Last Revised:
|
|
29 Aug 06
|
|
60 (108,790)
|
|
|
| |
Abstract:
One of the challenges of dealing with multiple contexts is the significant effort required to provide all necessary lifting rules so that statements in one context can be viewed and understood in other contexts. In this paper, we introduce the notion of structured contexts, where a lightweight ontology is used to provide a structure for representing contexts. With structured contexts, specialized inference algorithms can be used to significantly reduce the number of lifting rules required. We use a semantic data integration example to illustrate the concept of structured contexts and the benefits of this novel use of lightweight ontology.
structured contexts, lightweight ontology
|
|
|
18.
|
|
|
Hongwei (Harry) Zhu Massachusetts Institute of Technology (MIT) Stuart E. Madnick Massachusetts Institute of Technology (MIT) - Sloan School of Management Michael Siegel Massachusetts Institute of Technology (MIT) - Sloan School of Management
|
| Posted: |
|
17 Aug 04
|
|
Last Revised:
|
|
16 Aug 05
|
|
53 (115,599)
|
4
|
|
| |
Abstract:
The underlying assumptions for interpreting the meaning of data often change over time, which further complicates the problem of semantic heterogeneities among autonomous data sources. As an extension to the Context Interchange (COIN) framework, this paper introduces the notion of temporal context as a formalization of the problem. We represent temporal context as a multi-valued method in F-Logic; however, only one value is valid at any point in time, the determination of which is constrained by temporal relations. This representation is then mapped to an abductive constraint logic programming framework with temporal relations being treated as constraints. A mediation engine that implements the framework automatically detects and reconciles semantic differences at different times. We articulate that this extended COIN framework is suitable for reasoning on the Semantic Web.
Context Interchange (COIN) framework, temporal context, Semantic Web
|
|
|
19.
|
|
|
Stuart E. Madnick Massachusetts Institute of Technology (MIT) - Sloan School of Management Hongwei (Harry) Zhu Massachusetts Institute of Technology (MIT)
|
| Posted: |
|
18 Jan 06
|
|
Last Revised:
|
|
23 Mar 06
|
|
49 (119,760)
|
|
|
| |
Abstract:
In this paper, we first identify semantic heterogeneities that, when not resolved, often cause serious data quality problems. We discuss the especially challenging problems of temporal and aggregational ontological heterogeneity, which concerns how complex entities and their relationships are aggregated and reinterpreted over time. Then we illustrate how the COntext INterchange (COIN) technology can be used to capture data semantics and reconcile semantic heterogeneities in a scalable manner, thereby improving data quality.
Data Semantics, Semantic Heterogeneity, Aggregation, Temporal, Ontology, Context
|
|
|
20.
|
|
|
Thomas Gannon MITRE Corporation Stuart E. Madnick Massachusetts Institute of Technology (MIT) - Sloan School of Management Allen Moulton Massachusetts Institute of Technology (MIT) Michael Siegel Massachusetts Institute of Technology (MIT) - Sloan School of Management Marwan Sabbouh MITRE Corporation Hongwei (Harry) Zhu Massachusetts Institute of Technology (MIT)
|
| Posted: |
|
11 Mar 09
|
|
Last Revised:
|
|
11 Mar 09
|
|
31 (142,192)
|
|
|
| |
Abstract:
Technological advances such as Service Oriented Architecture (SOA) have increased the feasibility and importance of effectively integrating information from an ever widening number of systems within and across enterprises. A key difficulty of achieving this goal comes from the pervasive heterogeneity in all levels of information systems. A robust solution to this problem needs to be adaptable, extensible, and scalable. In this paper, we identify the deficiencies of traditional semantic integration approaches. The COntext INterchange (COIN) approach overcomes these deficiencies by declaratively representing data semantics and using a mediator to create the necessary conversion programs from a small number of conversion rules. The capabilities of COIN is demonstrated using an example with 150 data sources, where COIN can automatically generate the over 22,000 conversion programs needed to enable semantic interoperability using only six parametizable conversion rules. This paper presents a framework for evaluating adaptability, extensibility, and scalability of semantic integration approaches. The application of the framework is demonstrated with a systematic evaluation of COIN and other commonly practiced approaches.
service oriented architecture, Context Interchange
|
|
|
21.
|
|
|
Hongwei (Harry) Zhu Massachusetts Institute of Technology (MIT) Stuart E. Madnick Massachusetts Institute of Technology (MIT) - Sloan School of Management
|
| Posted: |
|
18 Sep 08
|
|
Last Revised:
|
|
18 Sep 08
|
|
20 (166,948)
|
|
|
| |
Abstract:
The change in meaning of data over time poses significant challenges for the use of that data. These challenges exist in the use of an individual data source and are further compounded with the integration of multiple sources. In this paper, we identify three types of temporal semantic heterogeneity. We propose a solution based on extensions to the Context Interchange framework, which has mechanisms for capturing semantics using ontology and temporal context. It also provides a mediation service that automatically reconciles semantic conflicts. We show the feasibility of this approach with a prototype that implements a subset of the proposed extensions.
temporal context, semantic heterogeneity, ontology, logic programming
|
|
|
22.
|
|
|
Xitong Li Tsinghua University Stuart E. Madnick Massachusetts Institute of Technology (MIT) - Sloan School of Management Hongwei (Harry) Zhu Massachusetts Institute of Technology (MIT) Yushun Fan Tsinghua University
|
| Posted: |
|
24 Sep 09
|
|
Last Revised:
|
|
24 Sep 09
|
|
7 (203,218)
|
|
|
| |
Abstract:
Service Oriented Computing (SOC) is a popular computing paradigm for the development of distributed Web applications. Service composition, a key element of SOC, is severely hampered by various types of semantic heterogeneity among the services. In this paper, we address the various semantic differences from the context perspective and use a lightweight ontology to describe the concepts and their specializations. Atomic conversions between the contexts are implemented using XPath functions and external services. The correspondences between the syntactic service descriptions and the semantic concepts are established using a flexible, standard-compliant mechanism. Given the naive BPEL composition ignoring semantic differences, our reconciliation approach can automatically determine and reconcile the semantic differences. The mediated BPEL composition incorporates necessary conversions to convert the data exchanged between different services. Our solution has the desirable properties (e.g., adaptability, extensibility and scalability) and can significantly alleviate the reconciliation efforts for Web services composition.
Web service, service composition, semantic heterogeneity, ontology, context
|
|