• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

在合并模型下缺失数据对物种树估计的影响。

Effects of missing data on species tree estimation under the coalescent.

机构信息

Department of Statistics, The Ohio State University, 404 Cockins Hall, 1958 Neil Avenue, Columbus, OH 43210, United States.

出版信息

Mol Phylogenet Evol. 2013 Dec;69(3):1057-62. doi: 10.1016/j.ympev.2013.06.004. Epub 2013 Jun 13.

DOI:10.1016/j.ympev.2013.06.004
PMID:23769751
Abstract

With recent advances in genomic sequencing, the importance of taking the effects of the processes that can cause discord between the speciation history and the individual gene histories into account has become evident. For multilocus datasets, it is difficult to achieve complete coverage of all sampled loci across all sample specimens, a problem that also arises when combining incompletely overlapping datasets. Here we examine how missing data affects the accuracy of species tree reconstruction. In our study, 10- and 100-locus sequence datasets were simulated under the coalescent model from shallow and deep speciation histories, and species trees were estimated using the maximum likelihood and Bayesian frameworks (with STEM and (*)BEAST, respectively). The accuracy of the estimated species trees was evaluated using the symmetric difference and the SPR distance. We examine the effects of sampling more than one individual per species, as well as the effects of different patterns of missing data (i.e., different amounts of missing data, which is represented among random taxa as opposed to being concentrated in specific taxa, as is often the case for empirical studies). Our general conclusion is that the species tree estimates are remarkably resilient to the effects of missing data. We find that for datasets with more limited numbers of loci, sampling more than one individual per species has the strongest effect on improving species tree accuracy when there is missing data, especially at higher degrees of missing data. For larger multilocus datasets (e.g., 25-100 loci), the amount of missing data has a negligible effect on species tree reconstruction, even at 50% missing data and a single sampled individual per species.

摘要

随着基因组测序技术的最新进展,考虑到可能导致物种形成历史和个体基因历史之间出现不一致的过程的影响变得至关重要。对于多点数据集,很难在所有样本标本中实现对所有采样基因座的完全覆盖,当组合不完整的重叠数据集时,也会出现这个问题。在这里,我们研究了缺失数据如何影响物种树重建的准确性。在我们的研究中,在浅度和深度物种形成历史下,从凝聚模型模拟了 10 个和 100 个基因座的序列数据集,并使用最大似然和贝叶斯框架(分别为 STEM 和(*)BEAST)估计了物种树。使用对称差异和 SPR 距离评估估计的物种树的准确性。我们检查了对每个物种采样多个个体的影响,以及不同缺失数据模式的影响(即,缺失数据的数量不同,在随机分类群中代表缺失数据,而不是像实证研究那样集中在特定分类群中)。我们的总体结论是,物种树估计对缺失数据的影响具有很强的弹性。我们发现,对于具有更有限数量基因座的数据集,在存在缺失数据时,对每个物种采样多个个体对提高物种树准确性的影响最大,尤其是在更高程度的缺失数据时。对于更大的多点数据集(例如,25-100 个基因座),即使缺失数据达到 50%且每个物种仅采样一个个体,缺失数据量对物种树重建的影响也可以忽略不计。

相似文献

1
Effects of missing data on species tree estimation under the coalescent.在合并模型下缺失数据对物种树估计的影响。
Mol Phylogenet Evol. 2013 Dec;69(3):1057-62. doi: 10.1016/j.ympev.2013.06.004. Epub 2013 Jun 13.
2
The influence of gene flow on species tree estimation: a simulation study.基因流对物种树估计的影响:一项模拟研究。
Syst Biol. 2014 Jan 1;63(1):17-30. doi: 10.1093/sysbio/syt049. Epub 2013 Aug 13.
3
Maximum likelihood estimates of species trees: how accuracy of phylogenetic inference depends upon the divergence history and sampling design.最大似然估计物种树:系统发育推断的准确性如何取决于分歧历史和采样设计。
Syst Biol. 2009 Oct;58(5):501-8. doi: 10.1093/sysbio/syp045. Epub 2009 Aug 20.
4
Full modeling versus summarizing gene-tree uncertainty: method choice and species-tree accuracy.全面建模与总结基因树不确定性:方法选择与物种树准确性。
Mol Phylogenet Evol. 2012 Nov;65(2):501-9. doi: 10.1016/j.ympev.2012.07.004. Epub 2012 Jul 23.
5
Sources of error inherent in species-tree estimation: impact of mutational and coalescent effects on accuracy and implications for choosing among different methods.种系树估计中固有的误差源:突变和合并效应对准确性的影响,以及对选择不同方法的影响。
Syst Biol. 2010 Oct;59(5):573-83. doi: 10.1093/sysbio/syq047. Epub 2010 Sep 10.
6
The accuracy of species tree estimation under simulation: a comparison of methods.基于模拟的物种树估计精度:方法比较。
Syst Biol. 2011 Mar;60(2):126-37. doi: 10.1093/sysbio/syq073. Epub 2010 Nov 18.
7
Applying species-tree analyses to deep phylogenetic histories: challenges and potential suggested from a survey of empirical phylogenetic studies.将物种树分析应用于深层系统发育历史:基于实证系统发育研究调查提出的挑战与潜力
Mol Phylogenet Evol. 2015 Feb;83:191-9. doi: 10.1016/j.ympev.2014.10.022. Epub 2014 Nov 4.
8
Estimating species phylogeny from gene-tree probabilities despite incomplete lineage sorting: an example from Melanoplus grasshoppers.尽管存在不完全谱系分选现象,仍可根据基因树概率估计物种系统发育:以黑蝗属蝗虫为例。
Syst Biol. 2007 Jun;56(3):400-11. doi: 10.1080/10635150701405560.
9
Coalescent methods for estimating phylogenetic trees.用于估计系统发育树的溯祖方法。
Mol Phylogenet Evol. 2009 Oct;53(1):320-8. doi: 10.1016/j.ympev.2009.05.033. Epub 2009 Jun 6.
10
To Include or Not to Include: The Impact of Gene Filtering on Species Tree Estimation Methods.包含还是不包含:基因过滤对物种树估计方法的影响。
Syst Biol. 2018 Mar 1;67(2):285-303. doi: 10.1093/sysbio/syx077.

引用本文的文献

1
Integrating Genomics and Biogeography to Unravel the Origin of a Mountain Biota: The Case of a Reptile Endemicity Hotspot in Arabia.整合基因组学与生物地理学以揭示山地生物群的起源:以阿拉伯半岛的一个爬行动物特有种热点地区为例。
Syst Biol. 2025 Apr 1;74(2):230-249. doi: 10.1093/sysbio/syae032.
2
A Timeline of Biosynthetic Gene Cluster Discovery in : From Characterization to Future Perspectives.《[具体领域]生物合成基因簇发现的时间线:从表征到未来展望》 (注:原文中“in”后面缺少具体内容,这里根据语境推测补充了“[具体领域]”,实际翻译时请根据准确信息调整)
J Fungi (Basel). 2024 Apr 2;10(4):266. doi: 10.3390/jof10040266.
3
Ancient Mitogenomes Reveal the Maternal Genetic History of East Asian Dogs.
古线粒体基因组揭示东亚犬的母系遗传历史。
Mol Biol Evol. 2024 Apr 2;41(4). doi: 10.1093/molbev/msae062.
4
Mining for a new class of fungal natural products: the evolution, diversity, and distribution of isocyanide synthase biosynthetic gene clusters.挖掘新型真菌天然产物:异腈合酶生物合成基因簇的进化、多样性和分布。
Nucleic Acids Res. 2023 Aug 11;51(14):7220-7235. doi: 10.1093/nar/gkad573.
5
Appendage-Bearing from Leaf Litter in Thailand.泰国落叶层中的附生植物。
J Fungi (Basel). 2023 May 29;9(6):625. doi: 10.3390/jof9060625.
6
Mining for a New Class of Fungal Natural Products: The Evolution, Diversity, and Distribution of Isocyanide Synthase Biosynthetic Gene Clusters.挖掘新型真菌天然产物:异氰酸酯合酶生物合成基因簇的进化、多样性及分布
bioRxiv. 2023 Apr 18:2023.04.17.537281. doi: 10.1101/2023.04.17.537281.
7
2b or not 2b? 2bRAD is an effective alternative to ddRAD for phylogenomics.是2b还是非2b?对于系统发育基因组学而言,2bRAD是双酶切RAD(ddRAD)的一种有效替代方法。
Ecol Evol. 2023 Mar 8;13(3):e9842. doi: 10.1002/ece3.9842. eCollection 2023 Mar.
8
Evaluation of Arabian Vascular Plant Barcodes (rbcL and matK): Precision of Unsupervised and Supervised Learning Methods towards Accurate Identification.阿拉伯维管植物条形码(rbcL和matK)评估:无监督和监督学习方法对准确识别的精度
Plants (Basel). 2021 Dec 13;10(12):2741. doi: 10.3390/plants10122741.
9
DNA barcoding of medicinal orchids in Asia.亚洲药用兰花的 DNA 条形码研究。
Sci Rep. 2021 Dec 8;11(1):23651. doi: 10.1038/s41598-021-03025-0.
10
There Is No 'Rule of Thumb': Genomic Filter Settings for a Small Plant Population to Obtain Unbiased Gene Flow Estimates.不存在“经验法则”:用于小型植物种群以获得无偏基因流估计值的基因组过滤设置。
Front Plant Sci. 2021 Oct 14;12:677009. doi: 10.3389/fpls.2021.677009. eCollection 2021.