Suppr超能文献

超快全基因组对合并时间的推断。

Ultrafast genome-wide inference of pairwise coalescence times.

机构信息

Department of Genetics, University of Cambridge, Cambridge CB2 1TN, United Kingdom

Department of Genetics, University of Cambridge, Cambridge CB2 1TN, United Kingdom.

出版信息

Genome Res. 2023 Jul;33(7):1023-1031. doi: 10.1101/gr.277665.123. Epub 2023 Aug 10.

Abstract

The pairwise sequentially Markovian coalescent (PSMC) algorithm and its extensions infer the coalescence time of two homologous chromosomes at each genomic position. This inference is used in reconstructing demographic histories, detecting selection signatures, studying genome-wide associations, constructing ancestral recombination graphs, and more. Inference of coalescence times between each pair of haplotypes in a large data set is of great interest, as they may provide rich information about the population structure and history of the sample. Here, we introduce a new method, , which is more than 10 times faster than current methods. To obtain this speed-up, we represent the posterior coalescence time distributions succinctly as a gamma distribution with just two parameters; in contrast, PSMC and its extensions hold these in a vector over discrete intervals of time. Thus, Gamma-SMC has constant time-complexity per site, without dependence on the number of discrete time states. Additionally, because of this continuous representation, our method is able to infer times spanning many orders of magnitude and, as such, is robust to parameter misspecification. We describe how this approach works, show its performance on simulated and real data, and illustrate its use in studying recent positive selection in the 1000 Genomes Project data set.

摘要

成对序贯马尔可夫链 coalescent(PSMC)算法及其扩展推断了每个基因组位置两条同源染色体的合并时间。这种推断用于重建人口历史、检测选择特征、研究全基因组关联、构建祖先重组图谱等。在大型数据集的每个 haplotypes 对之间推断合并时间是非常有趣的,因为它们可能提供有关样本种群结构和历史的丰富信息。在这里,我们引入了一种新方法,,比当前方法快 10 多倍。为了获得这种加速,我们将后验合并时间分布简洁地表示为具有两个参数的伽马分布;相比之下,PSMC 及其扩展在时间离散区间的向量中保持这些参数。因此,Gamma-SMC 每个位置的时间复杂度是常数,不依赖于离散时间状态的数量。此外,由于这种连续表示,我们的方法能够推断跨越多个数量级的时间,并且因此对参数指定不敏感。我们描述了这种方法的工作原理,展示了它在模拟和真实数据上的性能,并说明了它在研究 1000 基因组计划数据集最近正选择中的应用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/d34e/10538485/7d91b089f326/1023f01.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验