Ancestry Analysis Using an Innovative 56 Aim-Indel Panel and Machine Learning Methods
24 Pages Posted: 6 Dec 2022
Abstract
Insertion/deletion polymorphisms (InDel) can be used as one of the ancestry informative markers (AIM) in ancestry analysis. In this study, an innovative panel consisting of 56 AIM-InDel loci was used to investigate the genetic structure and genetic relationships between the Inner Mongolia Manchu (IMM) group and 26 reference populations. The IMM group was closely related in genetic background to East Asian populations, especially the Han Chinese in Beijing. Moreover, populations from northern and southern East Asia displayed obvious variations in ancestral components, suggesting the potential value of this panel in distinguishing the populations from northern and southern East Asia. Subsequently, four machine learning models were performed based on the 56 InDel loci to evaluate the performance of this panel in ancestry prediction. The random forest model presented better performance in ancestry prediction, with 91.87% and 99.73% accuracy for the five and three continental populations, respectively. All IMM individuals were assigned to the East Asian populations using the random forest model and were more closely related to the northern East Asian populations. Furthermore, the random forest model distinguished 87.18% of the IMM individuals from the six East Asian groups, suggesting that the random forest model based on the 56 AIM-InDels could be a potential tool for ancestry analysis.
Keywords: Ancestry analysis / Insertion/deletion polymorphisms / Inner Mongolia Manchus / Genetic relationship / Machine learning
Suggested Citation: Suggested Citation