header

Big Linked Cancer Data: Integrating Linked TCGA and PubMed

11 Pages Posted: 24 Jun 2018 First Look: Accepted

See all articles by Muhammad Saleem

Muhammad Saleem

University of Leipzig - Agile Knowledge Engineering and Semantic Web (AKSW)

Maulik R. Kamdar

National University of Ireland, Galway (NUIG) - Insight Centre for Data Analytics

Aftab Iqbal

National University of Ireland, Galway (NUIG) - Insight Centre for Data Analytics

Shanmukha Sampath

National University of Ireland, Galway (NUIG) - Insight Centre for Data Analytics

Helena Deus

National University of Ireland, Galway (NUIG) - Digital Enterprise Research Institute (DERI); Foundation Medicine, Inc.

Axel-Cyrille Ngonga Ngomo

University of Leipzig - Agile Knowledge Engineering and Semantic Web (AKSW)

Abstract

The amount of bio-medical data available on the Web grows exponentially with time. The resulting large volume of data makes manual exploration very tedious. Moreover, the velocity at which this data changes and the variety of formats in which bio-medical data is published makes it dicult to access them in an integrated form. Finally, the lack of an integrated vocabulary makes querying this data more dicult.In this paper, we advocate the use of Linked Data to integrate, query and visualize bio-medical data. The resulting Big Linked Data allows discovering knowledge distributed across manifold sources, making it viable for the serendipitous discovery of novel knowledge. We present the concept of Big Linked Data by showing how the constant stream of new bio-medical publications can be integrated with the Linked Cancer Genome Atlas dataset (TCGA) within a virtual integration scenario. We ensure the  scalability of our approach through the novel TopFed federated query engine, which we evaluate by comparing the query execution time of our system with that of FedX on Linked TCGA. Then, we show how we can harness the value hidden in the underlying integrated data by making it easier to explore through a user-friendly interface. We evaluate the usability of the interface by using the standard system usability  questionnaire as well as a csutomized questionnaire designed for the users of our system. Our overall result of 77 suggests that our interface is easy to use and can thus lead to novel insights.

Keywords: TCGA, PubMed, RDF, Linked Data, Visualization

Suggested Citation

Saleem, Muhammad and Kamdar, Maulik R. and Iqbal, Aftab and Sampath, Shanmukha and Deus, Helena and Ngonga Ngomo, Axel-Cyrille, Big Linked Cancer Data: Integrating Linked TCGA and PubMed (2014). Journal of Web Semantics First Look 27_1_6. Available at SSRN: https://ssrn.com/abstract=3199108 or http://dx.doi.org/10.2139/ssrn.3199108

Muhammad Saleem

University of Leipzig - Agile Knowledge Engineering and Semantic Web (AKSW) ( email )

Augustusplatz 10/11
Leipzig, 04109
Germany

Maulik R. Kamdar

National University of Ireland, Galway (NUIG) - Insight Centre for Data Analytics ( email )

The DERI Building IDA Business Park
Galway
Ireland

Aftab Iqbal

National University of Ireland, Galway (NUIG) - Insight Centre for Data Analytics ( email )

The DERI Building IDA Business Park
Galway
Ireland

Shanmukha Sampath

National University of Ireland, Galway (NUIG) - Insight Centre for Data Analytics ( email )

The DERI Building IDA Business Park
Galway
Ireland

Helena Deus

National University of Ireland, Galway (NUIG) - Digital Enterprise Research Institute (DERI) ( email )

University Road
Galway, Co. Kildare
Ireland

Foundation Medicine, Inc. ( email )

One Kendal Square
Cambridge, MA
United States

Axel-Cyrille Ngonga Ngomo (Contact Author)

University of Leipzig - Agile Knowledge Engineering and Semantic Web (AKSW) ( email )

Augustusplatz 10/11
Leipzig, 04109
Germany

Register to save articles to
your library

Register

Paper statistics

Abstract Views
239
Downloads
6