• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用ESTs进行系统发育基因组学研究:能否从有缺口的比对中准确推断系统发育树?

Using ESTs for phylogenomics: can one accurately infer a phylogenetic tree from a gappy alignment?

作者信息

Hartmann Stefanie, Vision Todd J

机构信息

Department of Biology, University of North Carolina, Chapel Hill, NC 27599, USA.

出版信息

BMC Evol Biol. 2008 Mar 26;8:95. doi: 10.1186/1471-2148-8-95.

DOI:10.1186/1471-2148-8-95
PMID:18366758
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2359737/
Abstract

BACKGROUND

While full genome sequences are still only available for a handful of taxa, large collections of partial gene sequences are available for many more. The alignment of partial gene sequences results in a multiple sequence alignment containing large gaps that are arranged in a staggered pattern. The consequences of this pattern of missing data on the accuracy of phylogenetic analysis are not well understood. We conducted a simulation study to determine the accuracy of phylogenetic trees obtained from gappy alignments using three commonly used phylogenetic reconstruction methods (Neighbor Joining, Maximum Parsimony, and Maximum Likelihood) and studied ways to improve the accuracy of trees obtained from such datasets.

RESULTS

We found that the pattern of gappiness in multiple sequence alignments derived from partial gene sequences substantially compromised phylogenetic accuracy even in the absence of alignment error. The decline in accuracy was beyond what would be expected based on the amount of missing data. The decline was particularly dramatic for Neighbor Joining and Maximum Parsimony, where the majority of gappy alignments contained 25% to 40% incorrect quartets. To improve the accuracy of the trees obtained from a gappy multiple sequence alignment, we examined two approaches. In the first approach, alignment masking, potentially problematic columns and input sequences are excluded from from the dataset. Even in the absence of alignment error, masking improved phylogenetic accuracy up to 100-fold. However, masking retained, on average, only 83% of the input sequences. In the second approach, alignment subdivision, the missing data is statistically modelled in order to retain as many sequences as possible in the phylogenetic analysis. Subdivision resulted in more modest improvements to alignment accuracy, but succeeded in including almost all of the input sequences.

CONCLUSION

These results demonstrate that partial gene sequences and gappy multiple sequence alignments can pose a major problem for phylogenetic analysis. The concern will be greatest for high-throughput phylogenomic analyses, in which Neighbor Joining is often the preferred method due to its computational efficiency. Both approaches can be used to increase the accuracy of phylogenetic inference from a gappy alignment. The choice between the two approaches will depend upon how robust the application is to the loss of sequences from the input set, with alignment masking generally giving a much greater improvement in accuracy but at the cost of discarding a larger number of the input sequences.

摘要

背景

虽然目前仅有少数分类单元拥有完整的基因组序列,但更多分类单元拥有大量的部分基因序列集合。部分基因序列的比对会产生一个包含大量以交错模式排列的缺口的多序列比对。这种缺失数据模式对系统发育分析准确性的影响尚未得到充分理解。我们进行了一项模拟研究,以确定使用三种常用的系统发育重建方法(邻接法、最大简约法和最大似然法)从有缺口的比对中获得的系统发育树的准确性,并研究提高从此类数据集中获得的树的准确性的方法。

结果

我们发现,即使在没有比对错误的情况下,由部分基因序列产生的多序列比对中的缺口模式也会严重损害系统发育准确性。准确性的下降超出了基于缺失数据量所预期的范围。对于邻接法和最大简约法,准确性下降尤为显著,其中大多数有缺口的比对包含25%至40%的错误四重奏。为了提高从有缺口的多序列比对中获得的树的准确性,我们研究了两种方法。在第一种方法,即比对屏蔽中,将潜在有问题的列和输入序列从数据集中排除。即使在没有比对错误的情况下,屏蔽也能将系统发育准确性提高多达100倍。然而,屏蔽平均仅保留83%的输入序列。在第二种方法,即比对细分中,对缺失数据进行统计建模,以便在系统发育分析中保留尽可能多的序列。细分对比对准确性的提高较为有限,但成功地纳入了几乎所有的输入序列。

结论

这些结果表明,部分基因序列和有缺口的多序列比对可能给系统发育分析带来重大问题。对于高通量系统发育基因组学分析,这种担忧最为突出,因为邻接法因其计算效率常常是首选方法。两种方法都可用于提高从有缺口的比对中进行系统发育推断的准确性。两种方法的选择将取决于应用对输入集中序列丢失的稳健程度,比对屏蔽通常能在准确性上有更大提升,但代价是丢弃更多的输入序列。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ea/2359737/9c886825c0b4/1471-2148-8-95-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ea/2359737/9e132d207746/1471-2148-8-95-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ea/2359737/e1e1f08d8bbb/1471-2148-8-95-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ea/2359737/60bc05407a40/1471-2148-8-95-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ea/2359737/9c886825c0b4/1471-2148-8-95-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ea/2359737/9e132d207746/1471-2148-8-95-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ea/2359737/e1e1f08d8bbb/1471-2148-8-95-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ea/2359737/60bc05407a40/1471-2148-8-95-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ea/2359737/9c886825c0b4/1471-2148-8-95-4.jpg

相似文献

1
Using ESTs for phylogenomics: can one accurately infer a phylogenetic tree from a gappy alignment?利用ESTs进行系统发育基因组学研究:能否从有缺口的比对中准确推断系统发育树?
BMC Evol Biol. 2008 Mar 26;8:95. doi: 10.1186/1471-2148-8-95.
2
SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees.SATe-II:一种非常快速且准确的同时估计多个序列比对和系统发育树的方法。
Syst Biol. 2012 Jan;61(1):90-106. doi: 10.1093/sysbio/syr095. Epub 2011 Dec 1.
3
Ancestral sequence alignment under optimal conditions.在最佳条件下进行祖先序列比对。
BMC Bioinformatics. 2005 Nov 17;6:273. doi: 10.1186/1471-2105-6-273.
4
A hierarchical model for incomplete alignments in phylogenetic inference.系统发育推断中不完全比对的层次模型。
Bioinformatics. 2009 Mar 1;25(5):592-8. doi: 10.1093/bioinformatics/btp015. Epub 2009 Jan 15.
5
Multiple sequence alignment accuracy and phylogenetic inference.多序列比对准确性和系统发育推断
Syst Biol. 2006 Apr;55(2):314-28. doi: 10.1080/10635150500541730.
6
Efficient computation of the phylogenetic likelihood function on multi-gene alignments and multi-core architectures.基于多基因比对和多核架构的系统发育似然函数的高效计算。
Philos Trans R Soc Lond B Biol Sci. 2008 Dec 27;363(1512):3977-84. doi: 10.1098/rstb.2008.0163.
7
Bayesian coestimation of phylogeny and sequence alignment.系统发育与序列比对的贝叶斯联合估计
BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83.
8
Phylogenetic inference under varying proportions of indel-induced alignment gaps.在不同比例的插入缺失导致的比对空位情况下的系统发育推断。
BMC Evol Biol. 2009 Aug 23;9:211. doi: 10.1186/1471-2148-9-211.
9
The impact of multiple protein sequence alignment on phylogenetic estimation.多序列比对对系统发育估计的影响。
IEEE/ACM Trans Comput Biol Bioinform. 2011 Jul-Aug;8(4):1108-19. doi: 10.1109/TCBB.2009.68.
10
OrthoSelect: a protocol for selecting orthologous groups in phylogenomics.OrthoSelect:一种在系统发育基因组学中选择直系同源组的方案。
BMC Bioinformatics. 2009 Jul 16;10:219. doi: 10.1186/1471-2105-10-219.

引用本文的文献

1
An alignment-free method for detection of missing regions for phylogenetic analysis.一种用于系统发育分析中缺失区域检测的无比对方法。
Heliyon. 2024 Jun 4;10(11):e32227. doi: 10.1016/j.heliyon.2024.e32227. eCollection 2024 Jun 15.
2
The genus (, ) reconsidered.(属名)重新考虑。 (此处括号内内容原文缺失,无法准确翻译)
Stud Mycol. 2024 Mar;107:149-249. doi: 10.3114/sim.2024.107.03. Epub 2024 Feb 22.
3
Evolutionary Insights into the Relationship of Frogs, Salamanders, and Caecilians and Their Adaptive Traits, with an Emphasis on Salamander Regeneration and Longevity.

本文引用的文献

1
Parsimony analysis of phylogenomic datasets (I): scripts and guidelines for using TNT (Tree Analysis using New Technology).系统发育基因组数据集简约分析(一):使用 TNT(利用新技术进行树分析)的脚本和指南。
Cladistics. 2022 Feb;38(1):103-125. doi: 10.1111/cla.12477. Epub 2021 Jul 14.
2
Combining data in phylogenetic analysis.在系统发育分析中合并数据。
Trends Ecol Evol. 1996 Apr;11(4):152-8. doi: 10.1016/0169-5347(96)10006-9.
3
A new effective method for estimating missing values in the sequence data prior to phylogenetic analysis.
蛙类、蝾螈和蚓螈及其适应性特征关系的进化见解,重点关注蝾螈的再生和长寿
Animals (Basel). 2023 Nov 8;13(22):3449. doi: 10.3390/ani13223449.
4
SNPtotree-Resolving the Phylogeny of SNPs on Non-Recombining DNA.SNPtotree-解析非重组 DNA 上 SNP 的系统发育。
Genes (Basel). 2023 Sep 22;14(10):1837. doi: 10.3390/genes14101837.
5
On six African species of and .关于六种非洲的……物种以及…… (原文信息不完整,翻译可能不太精准)
Fungal Syst Evol. 2021 Dec;8:163-178. doi: 10.3114/fuse.2021.08.13. Epub 2021 Nov 24.
6
Phylogenetic Signal Dissection of Heterogeneous 28S and 16S rRNA Genes in Spinicaudata (Branchiopoda, Diplostraca).后生动物(甲壳纲,双甲目)中异质的 28S 和 16S rRNA 基因的系统发育信号剖析。
Genes (Basel). 2021 Oct 27;12(11):1705. doi: 10.3390/genes12111705.
7
A new phylogenetic protocol: dealing with model misspecification and confirmation bias in molecular phylogenetics.一种新的系统发育分析方法:应对分子系统发育中的模型误设和确认偏差
NAR Genom Bioinform. 2020 Jun 23;2(2):lqaa041. doi: 10.1093/nargab/lqaa041. eCollection 2020 Jun.
8
A minimum reporting standard for multiple sequence alignments.多序列比对的最低报告标准。
NAR Genom Bioinform. 2020 Apr 14;2(2):lqaa024. doi: 10.1093/nargab/lqaa024. eCollection 2020 Jun.
9
Reconstructing the Complex Evolutionary History of the Papuasian Radiation Through Herbariomics.通过植物标本组学重建巴布亚生物辐射的复杂进化史。
Front Plant Sci. 2020 Mar 20;11:258. doi: 10.3389/fpls.2020.00258. eCollection 2020.
10
A Structurally-Validated Multiple Sequence Alignment of 497 Human Protein Kinase Domains.497 个人类蛋白激酶结构域的结构验证多重序列比对。
Sci Rep. 2019 Dec 24;9(1):19790. doi: 10.1038/s41598-019-56499-4.
一种新的有效方法,用于在进行系统发育分析之前估计序列数据中的缺失值。
Evol Bioinform Online. 2007 Feb 1;2:237-46.
4
Alignment uncertainty and genomic analysis.比对不确定性与基因组分析。
Science. 2008 Jan 25;319(5862):473-6. doi: 10.1126/science.1151532.
5
Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments.从蛋白质序列比对中去除分歧和比对不明确的区域后系统发育树的改进。
Syst Biol. 2007 Aug;56(4):564-77. doi: 10.1080/10635150701472164.
6
Accounting for gene rate heterogeneity in phylogenetic inference.在系统发育推断中考虑基因速率异质性。
Syst Biol. 2007 Apr;56(2):194-205. doi: 10.1080/10635150701291804.
7
The molecular ecologist's guide to expressed sequence tags.分子生态学家的表达序列标签指南。
Mol Ecol. 2007 Mar;16(5):907-24. doi: 10.1111/j.1365-294X.2006.03195.x.
8
Multiple sequence alignment: in pursuit of homologous DNA positions.多序列比对:寻找同源DNA位置。
Genome Res. 2007 Feb;17(2):127-35. doi: 10.1101/gr.5232407.
9
SDM: a fast distance-based approach for (super) tree building in phylogenomics.SDM:一种用于系统发育基因组学中(超)树构建的基于距离的快速方法。
Syst Biol. 2006 Oct;55(5):740-55. doi: 10.1080/10635150600969872.
10
Fast calculation of the quartet distance between trees of arbitrary degrees.快速计算任意度数树之间的四重距离。
Algorithms Mol Biol. 2006 Sep 25;1:16. doi: 10.1186/1748-7188-1-16.