header

Emerging Practices for Mapping and Linking Life Sciences Data Using RDF - A Case Series

25 Pages Posted: 27 Jun 2018 Publication Status: Accepted

See all articles by M. Scott Marshall

M. Scott Marshall

University of Amsterdam - Informatics Institute

Richard Boyce

University of Pittsburgh - Department of Biomedical Informatics

Helena Deus

National University of Ireland, Galway (NUIG) - Digital Enterprise Research Institute (DERI); Foundation Medicine, Inc.

Jun Zhao

University of Oxford - Department of Zoology

Egon Willighagen

University of Nijmegen

Matthias Samwald

National University of Ireland, Galway (NUIG) - Digital Enterprise Research Institute (DERI)

Elgar Pichler

W3C World Wide Web Consortium

Janos Hajagos

State University of New York (SUNY) - School of Medicine

Eric Prud’hommeaux

W3C World Wide Web Consortium

Susie Stephens

Johnson & Johnson

Abstract

Members of the W3C Health Care and Life Sciences Interest Group (HCLS IG) have published a variety of genomic and drug-related datasets as Resource Description Framework (RDF) triples. This experience has helped the interest group define a general data workflow for mapping health care and life science (HCLS) data to RDF and linking it with other Linked Data sources. This paper presents the workflow along with four case studies that demonstrate both the workflow and many of the challenges that may be faced when using the workflow to create new Linked Data resources. The first case study describes the creation of linked RDF data from microarray datasets while the second discusses a linked RDF dataset created from a knowledge base of drug therapies and drug targets. The third case study describes the creation of an RDF index of biomedical concepts present in unstructured clinical reports and how this index was linked to a drug side-effect knowledge base. The final case study describes the initial development of a linked dataset from a knowledge base of small molecules. This paper also provides a detailed set of recommended practices for creating and publishing Linked Data sources in the HCLS domain in such a way that they are discoverable and useable by users, Semantic Web agents, and applications. These practices are based on the cumulative experience of the Linked Open Drug Data (LODD) task force of the HCLS IG. While no single set of recommendations can address all of the heterogeneous information needs that exist within the HCLS domains, practitioners wishing to create Linked Data should find the recommendations useful for identifying the tools, techniques, and practices employed by earlier developers. In addition to clarifying available methods for producing Linked Data, the recommendations for metadata should also make the discovery and consumption of Linked Data easier.

Keywords: Semantic Web, Linked Data, provenance, ontology, health care, life sciences

Suggested Citation

Marshall, M. Scott and Boyce, Richard and Deus, Helena and Zhao, Jun and Willighagen, Egon and Samwald, Matthias and Pichler, Elgar and Hajagos, Janos and Prud’hommeaux, Eric and Stephens, Susie, Emerging Practices for Mapping and Linking Life Sciences Data Using RDF - A Case Series (2012). Journal of Web Semantics First Look, Available at SSRN: https://ssrn.com/abstract=3198960 or http://dx.doi.org/10.2139/ssrn.3198960

M. Scott Marshall (Contact Author)

University of Amsterdam - Informatics Institute

Spui 21
Amsterdam, 1018 WB
Netherlands

Richard Boyce

University of Pittsburgh - Department of Biomedical Informatics

5607 Baum Boulevard, Suite 500
Pittsburgh, PA 15206-3701
United States

Helena Deus

National University of Ireland, Galway (NUIG) - Digital Enterprise Research Institute (DERI) ( email )

University Road
Galway, Co. Kildare
Ireland

Foundation Medicine, Inc. ( email )

One Kendal Square
Cambridge, MA
United States

Jun Zhao

University of Oxford - Department of Zoology

New Radcliffe House
Radcliffe Observatory Quarter
Oxford, OX13 5QL
United Kingdom

Egon Willighagen

University of Nijmegen ( email )

Netherlands

Matthias Samwald

National University of Ireland, Galway (NUIG) - Digital Enterprise Research Institute (DERI)

University Road
Galway, Co. Kildare
Ireland

Elgar Pichler

W3C World Wide Web Consortium

United States

Janos Hajagos

State University of New York (SUNY) - School of Medicine

100 Nicolls Rd
Stony Brook, NY 11794
United States

Eric Prud’hommeaux

W3C World Wide Web Consortium

United States

Susie Stephens

Johnson & Johnson

New Brunswick, NJ 08933
United States

Do you have a job opening that you would like to promote on SSRN?

Paper statistics

Abstract Views
262
Downloads
11
PlumX Metrics