Suppr超能文献

资源受限下的人类祖先识别——一条染色体能告诉我们关于人类生物地理祖先的哪些信息?

Human ancestry indentification under resource constraints -- what can one chromosome tell us about human biogeographical ancestry?

作者信息

Toma Tanjin T, Dawson Jeremy M, Adjeroh Donald A

机构信息

Lane Department of Computer Science and Electrical Engineering, West Virginia University, Morgantown, WV, USA.

出版信息

BMC Med Genomics. 2018 Nov 20;11(Suppl 5):0. doi: 10.1186/s12920-018-0412-4.

Abstract

BACKGROUND

While continental level ancestry is relatively simple using genomic information, distinguishing between individuals from closely associated sub-populations (e.g., from the same continent) is still a difficult challenge.

METHODS

We study the problem of predicting human biogeographical ancestry from genomic data under resource constraints. In particular, we focus on the case where the analysis is constrained to using single nucleotide polymorphisms (SNPs) from just one chromosome. We propose methods to construct such ancestry informative SNP panels using correlation-based and outlier-based methods.

RESULTS

We accessed the performance of the proposed SNP panels derived from just one chromosome, using data from the 1000 Genome Project, Phase 3. For continental-level ancestry classification, we achieved an overall classification rate of 96.75% using 206 single nucleotide polymorphisms (SNPs). For sub-population level ancestry prediction, we achieved an average pairwise binary classification rates as follows: subpopulations in Europe: 76.6% (58 SNPs); Africa: 87.02% (87 SNPs); East Asia: 73.30% (68 SNPs); South Asia: 81.14% (75 SNPs); America: 85.85% (68 SNPs).

CONCLUSION

Our results demonstrate that one single chromosome (in particular, Chromosome 1), if carefully analyzed, could hold enough information for accurate prediction of human biogeographical ancestry. This has significant implications in terms of the computational resources required for analysis of ancestry, and in the applications of such analyses, such as in studies of genetic diseases, forensics, and soft biometrics.

摘要

背景

虽然利用基因组信息推断大陆水平的祖先相对简单,但区分来自密切相关亚群体(例如来自同一大陆)的个体仍然是一项艰巨的挑战。

方法

我们研究了在资源受限的情况下,从基因组数据预测人类生物地理祖先的问题。具体而言,我们专注于分析仅限于使用来自一条染色体的单核苷酸多态性(SNP)的情况。我们提出了使用基于相关性和基于异常值的方法来构建此类祖先信息SNP面板的方法。

结果

我们使用来自千人基因组计划第三阶段的数据,评估了仅从一条染色体衍生的所提出的SNP面板的性能。对于大陆水平的祖先分类,我们使用206个单核苷酸多态性(SNP)实现了96.75%的总体分类率。对于亚群体水平的祖先预测,我们实现了如下平均成对二元分类率:欧洲亚群体:76.6%(58个SNP);非洲:87.02%(87个SNP);东亚:73.30%(68个SNP);南亚:81.14%(75个SNP);美洲:85.85%(68个SNP)。

结论

我们的结果表明,如果仔细分析,一条单一的染色体(特别是1号染色体)可以包含足够的信息来准确预测人类生物地理祖先。这在祖先分析所需的计算资源以及此类分析的应用方面具有重要意义,例如在遗传疾病研究、法医学和软生物特征识别中。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf68/6245491/83ab16a8dd1b/12920_2018_412_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验