Exploring Genetic Basis for Diseases Through a Heterogeneous Bibliometric Network: Methodology and a Case Study

27 Pages Posted: 7 Aug 2020

See all articles by Mengjia Wu

Mengjia Wu

University of Technology Sydney (UTS)

Yi Zhang

University of Technology Sydney

Date Written: July 10, 2020

Abstract

Discovering the genetic basis for diseases is a crucial and challenging issue in modern medicine, among all the approaches targeting at this issue, literature-based knowledge discovery extends the exploring boundary and reveals implicit associations from unstructured textual data. However, most of current literature-based methods focus on a specific case and require the involvement of prior knowledge. In this paper, we propose an adaptable and transferable methodology to 1) identify crucial genetic factors for a given disease and 2) predict emerging genetic associations for the disease. Specifically, diseases, chemicals, genes, and genetic variations are extracted from literature data; a heterogeneous co-occurrence network is then constructed, and a semantic matrix is generated incorporating the word2vec model. Following this, key genes and genetic variations are identified through network measurements; emerging genetic factors associated with a given disease are captured via a weighted link prediction approach by involving the semantic matrix. We applied the proposed methodology to a literature dataset containing 54,219 scientific articles related to atrial fibrillation (AF) to demonstrate its reliability. The results yielded AF-related key entities including diseases, chemicals, genes, and genetic variants and a series of candidate genetic factors for AF, which could provide decision supports for medical researchers (e.g., identifying emerging directions for in-depth exploration) and policymakers (e.g., public health administration).

Keywords: Bibliometrics; Network analytics; Disease Genetic Basis; Word embedding

Suggested Citation

Wu, Mengjia and Zhang, Yi, Exploring Genetic Basis for Diseases Through a Heterogeneous Bibliometric Network: Methodology and a Case Study (July 10, 2020). Available at SSRN: https://ssrn.com/abstract=3647972 or http://dx.doi.org/10.2139/ssrn.3647972

Mengjia Wu (Contact Author)

University of Technology Sydney (UTS) ( email )

15 Broadway, Ultimo
PO Box 123
Sydney, NSW 2007
Australia

Yi Zhang

University of Technology Sydney ( email )

15 Broadway, Ultimo
PO Box 123
Sydney, NSW 2007
Australia
2007 (Fax)

Do you have negative results from your research you’d like to share?

Paper statistics

Downloads
62
Abstract Views
390
Rank
638,420
PlumX Metrics