Suppr超能文献

利用多重插入缺失(InDel)标记进行祖先推断的机器学习应用。

Application of machine learning for ancestry inference using multi-InDel markers.

机构信息

Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, Shanghai, China; Department of Fetal Medicine and Prenatal Diagnosis Center, Shanghai First Maternity and Infant Hospital, Tongji University School of Medicine, 2699 West Gaoke Rd, Shanghai 201204, China; Shanghai Key Laboratory of Maternal Fetal Medicine, Shanghai First Maternity and Infant Hospital, School of Medicine, Tongji University, Shanghai 200092, China.

Department of Forensic Medicine, School of Basic Medical Sciences, Fudan University, Shanghai, China.

出版信息

Forensic Sci Int Genet. 2022 Jul;59:102702. doi: 10.1016/j.fsigen.2022.102702. Epub 2022 Mar 30.

Abstract

Ancestry inference through population stratification plays an important role in forensic applications. Specifically, ancestry information inferred from forensic DNA evidence can provide vital clues for criminal investigations. Current advances in ancestry inference mostly focus on ancestry informative markers. Hereinto, multi-InDel was proposed as one of the compound markers performing well in complex ancestral classification in the subpopulation of Asia. However, research on analytical methods necessary to make reliable predictions is lacking. The newly proposed compound markers could be assessed with alternative methods. In this study, promising discriminant methods were explored using multi-InDel markers for forensic ancestry inference. As a prerequisite, the adopted multi-InDel markers were assessed by classical methods for population genetics, such as F analysis, MDS and STRUCTURE. In addition, dimensionality reduction methods and serial reduction strategies were applied for data visualization. Subsequently, machine learning methods, including logistic regression (LR), support vector machine (SVM), k-nearest neighbors (KNN) and extreme gradient boosting (XGBoost), were evaluated by diverse approaches. As the result of multifarious analyses through comparisons and estimations, XGBoost with one-hot encoding was shown to be more effective in population stratification and ancestry inference for challenging cases with admixed populations.

摘要

通过群体分层进行祖籍推断在法医学应用中起着重要作用。具体来说,从法医 DNA 证据推断出的祖籍信息可以为犯罪调查提供重要线索。目前,祖籍推断的进展主要集中在祖籍信息标记上。在此,多插入缺失(InDel)被提出作为在亚洲亚群中进行复杂祖先分类的表现良好的复合标记之一。然而,缺乏关于做出可靠预测所需的分析方法的研究。新提出的复合标记可以用替代方法进行评估。在这项研究中,使用多插入缺失标记探索了有前途的判别方法,用于法医祖籍推断。作为前提,采用的多插入缺失标记通过群体遗传学的经典方法进行了评估,例如 F 分析、MDS 和 STRUCTURE。此外,还应用降维方法和序列化减少策略进行数据可视化。随后,通过多种方法评估了机器学习方法,包括逻辑回归(LR)、支持向量机(SVM)、k-最近邻(KNN)和极端梯度提升(XGBoost)。通过比较和估计的多种分析,结果表明,XGBoost 与独热编码在混合人群的具有挑战性案例中的群体分层和祖籍推断方面更为有效。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验