• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大样本量的高效合并模拟和谱系分析

Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes.

作者信息

Kelleher Jerome, Etheridge Alison M, McVean Gilean

机构信息

Wellcome Trust Centre for Human Genetics, University of Oxford, Oxford, United Kingdom.

Department of Statistics, University of Oxford, Oxford, United Kingdom.

出版信息

PLoS Comput Biol. 2016 May 4;12(5):e1004842. doi: 10.1371/journal.pcbi.1004842. eCollection 2016 May.

DOI:10.1371/journal.pcbi.1004842
PMID:27145223
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4856371/
Abstract

A central challenge in the analysis of genetic variation is to provide realistic genome simulation across millions of samples. Present day coalescent simulations do not scale well, or use approximations that fail to capture important long-range linkage properties. Analysing the results of simulations also presents a substantial challenge, as current methods to store genealogies consume a great deal of space, are slow to parse and do not take advantage of shared structure in correlated trees. We solve these problems by introducing sparse trees and coalescence records as the key units of genealogical analysis. Using these tools, exact simulation of the coalescent with recombination for chromosome-sized regions over hundreds of thousands of samples is possible, and substantially faster than present-day approximate methods. We can also analyse the results orders of magnitude more quickly than with existing methods.

摘要

基因变异分析中的一个核心挑战是对数百万个样本进行逼真的基因组模拟。目前的合并模拟扩展性不佳,或者使用的近似方法无法捕捉重要的长程连锁特性。分析模拟结果也面临重大挑战,因为当前存储系谱的方法占用大量空间、解析速度慢,且未利用相关树中的共享结构。我们通过引入稀疏树和合并记录作为系谱分析的关键单元来解决这些问题。使用这些工具,可以对数十万个样本的染色体大小区域进行带重组的合并精确模拟,且比目前的近似方法快得多。与现有方法相比,我们还能将结果分析速度提高几个数量级。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91e4/4856371/a04bd7b9fe15/pcbi.1004842.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91e4/4856371/aa3a2489ba40/pcbi.1004842.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91e4/4856371/19d27bd8b628/pcbi.1004842.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91e4/4856371/a1b8d29862f2/pcbi.1004842.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91e4/4856371/cfbe88af6fa4/pcbi.1004842.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91e4/4856371/a04bd7b9fe15/pcbi.1004842.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91e4/4856371/aa3a2489ba40/pcbi.1004842.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91e4/4856371/19d27bd8b628/pcbi.1004842.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91e4/4856371/a1b8d29862f2/pcbi.1004842.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91e4/4856371/cfbe88af6fa4/pcbi.1004842.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/91e4/4856371/a04bd7b9fe15/pcbi.1004842.g005.jpg

相似文献

1
Efficient Coalescent Simulation and Genealogical Analysis for Large Sample Sizes.大样本量的高效合并模拟和谱系分析
PLoS Comput Biol. 2016 May 4;12(5):e1004842. doi: 10.1371/journal.pcbi.1004842. eCollection 2016 May.
2
Efficient pedigree recording for fast population genetics simulation.高效的家系记录,实现快速的群体遗传学模拟。
PLoS Comput Biol. 2018 Nov 1;14(11):e1006581. doi: 10.1371/journal.pcbi.1006581. eCollection 2018 Nov.
3
Accounting for long-range correlations in genome-wide simulations of large cohorts.在大型队列的全基因组模拟中考虑长程相关性。
PLoS Genet. 2020 May 5;16(5):e1008619. doi: 10.1371/journal.pgen.1008619. eCollection 2020 May.
4
Approximating the coalescent with recombination.用重组近似溯祖过程。
Philos Trans R Soc Lond B Biol Sci. 2005 Jul 29;360(1459):1387-93. doi: 10.1098/rstb.2005.1673.
5
An efficient algorithm for generating the internal branches of a Kingman coalescent.一种用于生成金曼合并过程内部分支的高效算法。
Theor Popul Biol. 2018 Jul;122:57-66. doi: 10.1016/j.tpb.2017.05.002. Epub 2017 Jul 11.
6
Cosi2: an efficient simulator of exact and approximate coalescent with selection.Cosi2:一种用于精确和近似选择合并的高效模拟器。
Bioinformatics. 2014 Dec 1;30(23):3427-9. doi: 10.1093/bioinformatics/btu562. Epub 2014 Aug 22.
7
On the joint distribution of tree height and tree length under the coalescent.关于合并过程下树高与树长的联合分布
Theor Popul Biol. 2018 Jul;122:46-56. doi: 10.1016/j.tpb.2017.10.008. Epub 2017 Nov 10.
8
Critical assessment of coalescent simulators in modeling recombination hotspots in genomic sequences.对基因组序列中重组热点建模的合并模拟器的批判性评估。
BMC Bioinformatics. 2014 Jan 3;15:3. doi: 10.1186/1471-2105-15-3.
9
Simulation of 'hitch-hiking' genealogies.“搭便车”系谱的模拟。
J Math Biol. 2001 Jan;42(1):41-70. doi: 10.1007/pl00000072.
10
Fast "coalescent" simulation.快速“合并”模拟。
BMC Genet. 2006 Mar 15;7:16. doi: 10.1186/1471-2156-7-16.

引用本文的文献

1
Robust and accurate Bayesian inference of genome-wide genealogies for hundreds of genomes.针对数百个基因组的全基因组谱系进行稳健且准确的贝叶斯推断。
Nat Genet. 2025 Sep 8. doi: 10.1038/s41588-025-02317-9.
2
Estimating gene conversion rates from population data using multi-individual identity by descent.利用多位个体的同源性从群体数据估计基因转换率。
Am J Hum Genet. 2025 Aug 16. doi: 10.1016/j.ajhg.2025.07.019.
3
GHIST 2024: The 1st Genomic History Inference Strategies Tournament.GHIST 2024:第一届基因组历史推断策略竞赛。

本文引用的文献

1
TESTING THE CONSTANT-RATE NEUTRAL ALLELE MODEL WITH PROTEIN SEQUENCE DATA.用蛋白质序列数据检验恒速中性等位基因模型
Evolution. 1983 Jan;37(1):203-217. doi: 10.1111/j.1558-5646.1983.tb05528.x.
2
Efficient genotype compression and analysis of large genetic-variation data sets.大型基因变异数据集的高效基因型压缩与分析
Nat Methods. 2016 Jan;13(1):63-5. doi: 10.1038/nmeth.3654. Epub 2015 Nov 9.
3
Big data: The power of petabytes.大数据:拍字节的力量。
bioRxiv. 2025 Aug 11:2025.08.05.668560. doi: 10.1101/2025.08.05.668560.
4
The Length of Haplotype Blocks and Signals of Structural Variation in Reconstructed Genealogies.重构谱系中单体型块的长度及结构变异信号
Mol Biol Evol. 2025 Sep 1;42(9). doi: 10.1093/molbev/msaf190.
5
Effective Population Size Estimation in Large Marine Populations: Considering Current Challenges and Opportunities When Simulating Large Data Sets With High-Density Genomic Information.大型海洋种群有效种群大小的估计:在利用高密度基因组信息模拟大型数据集时考虑当前的挑战与机遇
Evol Appl. 2025 Jul 28;18(8):e70121. doi: 10.1111/eva.70121. eCollection 2025 Aug.
6
SimOutbreakSelection: a simulation-based tool to optimise sampling design and analysis strategies for detecting epidemic-driven selection.SimOutbreakSelection:一种基于模拟的工具,用于优化抽样设计和分析策略以检测疫情驱动的选择。
Nat Commun. 2025 Jul 24;16(1):6814. doi: 10.1038/s41467-025-61574-8.
7
Tsbrowse: an interactive browser for ancestral recombination graphs.Tsbrowse:一种用于祖先重组图的交互式浏览器。
Bioinformatics. 2025 Aug 2;41(8). doi: 10.1093/bioinformatics/btaf393.
8
Admixed and single-continental genome segments of the same ancestry have distinct linkage disequilibrium patterns.具有相同祖先的混合和单一大陆基因组片段具有不同的连锁不平衡模式。
Genome Biol. 2025 Jul 11;26(1):201. doi: 10.1186/s13059-025-03672-w.
9
Pathways to Recovery: Genomics and Resistance Assays for Tree Species Devastated by the Myrtle Rust Pathogen.恢复之路:受桃金娘锈病菌侵害的树种的基因组学与抗性检测
Mol Ecol. 2025 Aug;34(16):e70030. doi: 10.1111/mec.70030. Epub 2025 Jul 10.
10
Recent Statistical Innovations in Human Genetics.人类遗传学领域的最新统计创新
Ann Hum Genet. 2025 Sep;89(5):241-254. doi: 10.1111/ahg.12606. Epub 2025 Jun 27.
Nature. 2015 Nov 5;527(7576):S2-4. doi: 10.1038/527S2a.
4
BGT: efficient and flexible genotype query across many samples.BGT:跨多个样本进行高效灵活的基因型查询。
Bioinformatics. 2016 Feb 15;32(4):590-2. doi: 10.1093/bioinformatics/btv613. Epub 2015 Oct 24.
5
A global reference for human genetic variation.人类遗传变异的全球参考。
Nature. 2015 Oct 1;526(7571):68-74. doi: 10.1038/nature15393.
6
Novel insights into the genetics of smoking behaviour, lung function, and chronic obstructive pulmonary disease (UK BiLEVE): a genetic association study in UK Biobank.对吸烟行为、肺功能和慢性阻塞性肺疾病(英国生物银行)遗传学的新认识:英国生物银行中的一项遗传关联研究。
Lancet Respir Med. 2015 Oct;3(10):769-81. doi: 10.1016/S2213-2600(15)00283-0. Epub 2015 Sep 27.
7
Hybrid-Lambda: simulation of multiple merger and Kingman gene genealogies in species networks and species trees.混合λ:物种网络和物种树中多重合并与金曼基因谱系的模拟
BMC Bioinformatics. 2015 Sep 15;16:292. doi: 10.1186/s12859-015-0721-y.
8
The UK10K project identifies rare variants in health and disease.英国万人基因组计划识别健康与疾病中的罕见变异。
Nature. 2015 Oct 1;526(7571):82-90. doi: 10.1038/nature14962. Epub 2015 Sep 14.
9
Big Data: Astronomical or Genomical?大数据:天文学的还是基因组学的?
PLoS Biol. 2015 Jul 7;13(7):e1002195. doi: 10.1371/journal.pbio.1002195. eCollection 2015 Jul.
10
Large-scale whole-genome sequencing of the Icelandic population.大规模全基因组测序的冰岛人口。
Nat Genet. 2015 May;47(5):435-44. doi: 10.1038/ng.3247. Epub 2015 Mar 25.