Suppr超能文献

基于全基因组数据的法医生物地理祖籍推断的 AISNPs 筛选和分类算法的系统分析。

Systematic analyses of AISNPs screening and classification algorithms based on genome-wide data for forensic biogeographic ancestry inference.

机构信息

Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, Guangdong, China.

Guangzhou Key Laboratory of Forensic Multi-Omics for Precision Identification, School of Forensic Medicine, Southern Medical University, Guangzhou, Guangdong, China; Microbiome Medicine Center, Department of Laboratory Medicine, Zhujiang Hospital, Southern Medical University, Guangzhou, Guangdong, China.

出版信息

Forensic Sci Int. 2024 Apr;357:111975. doi: 10.1016/j.forsciint.2024.111975. Epub 2024 Mar 2.

Abstract

Identifying the biogeographic ancestral origin of biological sample left at a crime scene can provide important evidence for judicial case, as well as clue for narrowing down suspect. Ancestry informative single nucleotide polymorphism (AISNP) has become one of the most important genetic markers in recent years for screening ancestry information loci and analyzing the population genetic background and structure due to their high number and wide distributions in the human genome. In this study, based on data from 26 populations in the 1000 Genomes Project Phase 3, a Random Forest classification model was constructed with one-vs-rest classification strategy for embedded feature selection in order to obtain a panel with a small number of efficient AISNPs. The research aim was to clarify differentiations of population genetic structures among continents and subregions of East Asia. ADMIXTURE results showed that based on the 58 AISNPs selected by the machine learning algorithm, the 26 populations involved in the study could be categorized into six intercontinental ancestry components: North East Asia, South East Asia, Africa, Europe, South Asia, and America. The 24 continental-specific AISNPs and 34 East Asian-specific AISNPs were finally obtained, and used to construct the ancestry prediction model using XGBoost algorithm, resulting in the Matthews correlation coefficients of 0.94 and 0.89, and accuracies of 0.94 and 0.92, respectively. The machine learning models that we constructed using population-specific AISNPs were able to accurately predict the ancestral origins of continental and intra-East Asian populations. To summarize, screening a set of high-perform AISNPs to infer biogeographical ancestral information using embedded feature selection has potential application in creating a layered inference system that accurately differentiates from intercontinental populations to local subpopulations.

摘要

鉴定犯罪现场遗留生物样本的生物地理祖先起源,可以为司法案件提供重要证据,也可以为缩小嫌疑人范围提供线索。由于其在人类基因组中的数量多且分布广泛,因此,单核苷酸多态性(SNP)成为近年来筛选祖先信息位点、分析群体遗传背景和结构的最重要遗传标记之一。本研究基于 1000 基因组计划第三阶段 26 个人群的数据,采用一对一分类策略的随机森林分类模型进行嵌入式特征选择,以获得一个具有少数高效 SNP 的面板。本研究旨在阐明不同大洲和东亚亚区的种群遗传结构差异。ADMIXTURE 结果表明,基于机器学习算法选择的 58 个 SNP,可以将研究中涉及的 26 个人群分为六个洲际祖先成分:东北亚、东南亚、非洲、欧洲、南亚和美洲。最终获得了 24 个大陆特异性 SNP 和 34 个东亚特异性 SNP,并使用 XGBoost 算法构建了祖先预测模型,结果分别为 Matthews 相关系数 0.94 和 0.89,准确率为 0.94 和 0.92。使用人群特异性 SNP 构建的机器学习模型能够准确预测大陆和东亚内部人群的祖先起源。总之,筛选一组高性能 SNP 进行嵌入式特征选择,以推断生物地理祖先信息,在创建一个能够从洲际人群到本地亚群进行准确区分的分层推断系统方面具有潜在的应用价值。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验