Suppr超能文献

多元统计方法和机器学习在法医领域中用于评估生物地理学祖先推断。

Multivariate statistical approach and machine learning for the evaluation of biogeographical ancestry inference in the forensic field.

机构信息

Department of Chemistry, University of Turin, Turin, Italy.

Centro Regionale Antidoping e di Tossicologia "A. Bertinaria", Orbassano, Torino, Italy.

出版信息

Sci Rep. 2022 May 28;12(1):8974. doi: 10.1038/s41598-022-12903-0.

Abstract

The biogeographical ancestry (BGA) of a trace or a person/skeleton refers to the component of ethnicity, constituted of biological and cultural elements, that is biologically determined. Nowadays, many individuals are interested in exploring their genealogy, and the capability to distinguish biogeographic information about population groups and subgroups via DNA analysis plays an essential role in several fields such as in forensics. In fact, for investigative and intelligence purposes, it is beneficial to inference the biogeographical origins of perpetrators of crimes or victims of unsolved cold cases when no reference profile from perpetrators or database hits for comparative purposes are available. Current approaches for biogeographical ancestry estimation using SNPs data are usually based on PCA and Structure software. The present study provides an alternative method that involves multivariate data analysis and machine learning strategies to evaluate BGA discriminating power of unknown samples using different commercial panels. Starting from 1000 Genomes project, Simons Genome Diversity Project and Human Genome Diversity Project datasets involving African, American, Asian, European and Oceania individuals, and moving towards further and more geographically restricted populations, powerful multivariate techniques such as Partial Least Squares-Discriminant Analysis (PLS-DA) and machine learning techniques such as XGBoost were employed, and their discriminating power was compared. PLS-DA method provided more robust classifications than XGBoost method, showing that the adopted approach might be an interesting tool for forensic experts to infer BGA information from the DNA profile of unknown individuals, but also highlighting that the commercial forensic panels could be inadequate to discriminate populations at intra-continental level.

摘要

该痕迹或个人/骨骼的生物地理祖先(BGA)是指构成种族的组成部分,包括生物和文化元素,这些元素是由生物决定的。如今,许多人都有兴趣探索自己的家谱,而通过 DNA 分析区分人群和亚群的生物地理信息的能力在法医等多个领域中起着至关重要的作用。事实上,为了调查和情报目的,当没有可用于比较的犯罪者或未解决的冷案受害者的参考档案时,推断犯罪者或受害者的生物地理起源是有益的。目前使用 SNP 数据进行生物地理祖先估计的方法通常基于 PCA 和 Structure 软件。本研究提供了一种替代方法,涉及多元数据分析和机器学习策略,以使用不同的商业面板评估未知样本的 BGA 判别能力。从 1000 基因组计划、西蒙斯基因组多样性计划和人类基因组多样性计划涉及非洲、美国、亚洲、欧洲和大洋洲个体的数据集开始,进一步扩展到更多的地理限制人群,采用了偏最小二乘判别分析(PLS-DA)等强大的多元技术和 XGBoost 等机器学习技术,并比较了它们的判别能力。PLS-DA 方法提供了比 XGBoost 方法更稳健的分类,表明所采用的方法可能是法医专家从未知个体的 DNA 谱推断 BGA 信息的有趣工具,但也突出表明商业法医面板可能不足以在大陆内水平上区分人群。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f81d/9148302/91dc3c9cb42b/41598_2022_12903_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验