Suppr超能文献

如何选择一套亲缘信息标记物:一种有监督的特征选择方法。

How to choose sets of ancestry informative markers: A supervised feature selection approach.

机构信息

University of Freiburg, Department of Mathematical Stochastics, Ernst-Zermelo-Straße 1, D-79104 Freiburg, Germany.

University of Freiburg, Faculty of Medicine and Medical Center, Institute of Genetic Epidemiology, Germany.

出版信息

Forensic Sci Int Genet. 2020 May;46:102259. doi: 10.1016/j.fsigen.2020.102259. Epub 2020 Feb 15.

Abstract

Inference of the Biogeographical Ancestry (BGA) of a person or trace relies on three ingredients: (1) a reference database of DNA samples including BGA information; (2) a statistical clustering method; (3) a set of loci which segregate dependent on geographical location, i.e. a set of so-called Ancestry Informative Markers (AIMs). We used the theory of feature selection from statistical learning in order to obtain AIMsets for BGA inference. Using simulations, we show that this learning procedure works in various cases, and outperforms ad hoc methods, based on statistics like F or informativeness for the choice of AIMs. Applying our method to data from the 1000 genomes project (excluding Admixed Americans) we identified an AIMset of 12 SNPs, which gives a vanishing misclassification error on a continental scale, as do other published AIMsets. In fact, cross validation shows that there exists a multitude of sets with comparable performance to the optimal AIMset. On a sub-continental scale, we find a set of 55 SNPs for distinguishing the five European populations. The misclassification error is reduced by a factor of two relative to published AIMsets, but is still 30% and therefore too large in order to be useful in forensic applications.

摘要

推断一个人或痕迹的生物地理祖先(BGA)依赖于三个要素:(1)包括 BGA 信息的 DNA 样本参考数据库;(2)统计聚类方法;(3)一组依赖于地理位置分离的基因座,即一组所谓的祖先信息标记(AIMs)。我们使用统计学习中的特征选择理论来获取用于 BGA 推断的 AIMsets。通过模拟,我们表明该学习过程在各种情况下都有效,并且优于基于统计的特定方法,例如 F 统计量或用于选择 AIMs 的信息量。将我们的方法应用于 1000 基因组计划(不包括混合美国人)的数据,我们确定了一个由 12 个 SNP 组成的 AIMset,它在大陆范围内的分类错误率为零,其他已发表的 AIMsets 也是如此。实际上,交叉验证表明存在许多与最优 AIMset 性能相当的集合。在次大陆范围内,我们发现了一组 55 个 SNP,用于区分五个欧洲人群。与已发表的 AIMsets 相比,分类错误率降低了两倍,但仍为 30%,因此对于法医应用来说太大了。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验