Suppr超能文献

在大型人群数据集推断全基因组历史。

Inferring whole-genome histories in large population datasets.

机构信息

Big Data Institute, Li Ka Shing Centre for Health Information and Discovery, University of Oxford, Oxford, UK.

出版信息

Nat Genet. 2019 Sep;51(9):1330-1338. doi: 10.1038/s41588-019-0483-y. Epub 2019 Sep 2.

Abstract

Inferring the full genealogical history of a set of DNA sequences is a core problem in evolutionary biology, because this history encodes information about the events and forces that have influenced a species. However, current methods are limited, and the most accurate techniques are able to process no more than a hundred samples. As datasets that consist of millions of genomes are now being collected, there is a need for scalable and efficient inference methods to fully utilize these resources. Here we introduce an algorithm that is able to not only infer whole-genome histories with comparable accuracy to the state-of-the-art but also process four orders of magnitude more sequences. The approach also provides an 'evolutionary encoding' of the data, enabling efficient calculation of relevant statistics. We apply the method to human data from the 1000 Genomes Project, Simons Genome Diversity Project and UK Biobank, showing that the inferred genealogies are rich in biological signal and efficient to process.

摘要

推断一组 DNA 序列的完整谱系历史是进化生物学中的核心问题,因为这段历史编码了影响物种的事件和力量的信息。然而,目前的方法存在局限性,最准确的技术能够处理的样本不超过一百个。由于现在正在收集包含数百万个基因组的数据集,因此需要可扩展和高效的推断方法来充分利用这些资源。在这里,我们介绍了一种算法,它不仅能够以与最先进技术相当的准确性推断全基因组历史,还能够处理四个数量级更多的序列。该方法还提供了数据的“进化编码”,能够有效地计算相关统计信息。我们将该方法应用于来自 1000 基因组计划、西蒙斯基因组多样性计划和英国生物库的人类数据,结果表明,推断出的系统发育树富含生物学信号,并且处理效率很高。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c8aa/6726478/293cf949b27c/EMS83740-f001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验