Suppr超能文献

古代人类核DNA污染、误差和人口统计学的联合估计

Joint Estimation of Contamination, Error and Demography for Nuclear DNA from Ancient Humans.

作者信息

Racimo Fernando, Renaud Gabriel, Slatkin Montgomery

机构信息

Department of Integrative Biology, University of California, Berkeley, Berkeley, California, United States of America.

Department of Evolutionary Genetics, Max Planck Institute for Evolutionary Anthropology, Leipzig, Germany.

出版信息

PLoS Genet. 2016 Apr 6;12(4):e1005972. doi: 10.1371/journal.pgen.1005972. eCollection 2016 Apr.

Abstract

When sequencing an ancient DNA sample from a hominin fossil, DNA from present-day humans involved in excavation and extraction will be sequenced along with the endogenous material. This type of contamination is problematic for downstream analyses as it will introduce a bias towards the population of the contaminating individual(s). Quantifying the extent of contamination is a crucial step as it allows researchers to account for possible biases that may arise in downstream genetic analyses. Here, we present an MCMC algorithm to co-estimate the contamination rate, sequencing error rate and demographic parameters-including drift times and admixture rates-for an ancient nuclear genome obtained from human remains, when the putative contaminating DNA comes from present-day humans. We assume we have a large panel representing the putative contaminant population (e.g. European, East Asian or African). The method is implemented in a C++ program called 'Demographic Inference with Contamination and Error' (DICE). We applied it to simulations and genome data from ancient Neanderthals and modern humans. With reasonable levels of genome sequence coverage (>3X), we find we can recover accurate estimates of all these parameters, even when the contamination rate is as high as 50%.

摘要

在对古人类化石的古代DNA样本进行测序时,参与挖掘和提取工作的现代人类的DNA会与内源物质一起被测序。这种污染类型对于下游分析来说是个问题,因为它会导致偏向污染个体群体的偏差。量化污染程度是关键的一步,因为这能让研究人员考虑到下游基因分析中可能出现的偏差。在此,我们提出一种马尔可夫链蒙特卡罗(MCMC)算法,用于共同估计从人类遗骸获得的古代核基因组的污染率、测序错误率和人口统计学参数(包括漂变时间和混合率),假定污染DNA来自现代人类。我们假设我们有一个代表假定污染群体(如欧洲人、东亚人或非洲人)的大样本。该方法在一个名为“考虑污染和错误的人口统计学推断”(DICE)的C++程序中实现。我们将其应用于古代尼安德特人和现代人类的模拟数据及基因组数据。在基因组序列覆盖度达到合理水平(>3X)时,我们发现即使污染率高达50%,我们也能够准确估计所有这些参数。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8e3f/4822957/4e5f9104b0ac/pgen.1005972.g005.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验