Querying NeXtProt Nanopublications and Their Value for Insights on Sequence Variants and Tissue Expression
24 Pages Posted: 10 Jul 2018 First Look: Accepted
Understanding how genetic differences between individuals impact the regulation, expression, and ultimately function of proteins is an important step toward realizing the promise of personal medicine. There are several technical barriers hindering the transition of biological knowledge into the applications relevant to precision medicine. One important challenge for data integration is that new biological sequences (proteins, DNA) have multiple issues related to interoperability potentially creating a quagmire in the published data, especially when different data sources do not appear to be in agreement. Thus, there is an urgent need for systems and methodologies to facilitate the integration of information in a uniform manner to allow seamless querying of multiple data types which can illuminate, for example, the relationships between protein modifications and causative genomic variants. Our work demonstrates for the first time how semantic technologies can be used to address these challenges using the nanopublication model applied to the neXtProt data set, a curated knowledgebase of information about human proteins. We have applied the nanopublication model to demonstrate querying over several named graphs, including the provenance information associated with the curated scientific assertions from neXtProt. We show by the way of use cases using sequence variations, post-translational modifications and tissue expression, that querying the neXtProt nanopublication implementation is a credible approach for expanding biological insight.
Keywords: biological databases, linked data, semantic web, nanopublication, post-translation modification, single nucleotide polymorphisms, tissue expression
Suggested Citation: Suggested Citation