• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

树对齐问题。

The tree alignment problem.

机构信息

Division of Invertebrate Zoology, American Museum of Natural History, New York, NY 10024, USA.

出版信息

BMC Bioinformatics. 2012 Nov 9;13:293. doi: 10.1186/1471-2105-13-293.

DOI:10.1186/1471-2105-13-293
PMID:23140486
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3605350/
Abstract

BACKGROUND

The inference of homologies among DNA sequences, that is, positions in multiple genomes that share a common evolutionary origin, is a crucial, yet difficult task facing biologists. Its computational counterpart is known as the multiple sequence alignment problem. There are various criteria and methods available to perform multiple sequence alignments, and among these, the minimization of the overall cost of the alignment on a phylogenetic tree is known in combinatorial optimization as the Tree Alignment Problem. This problem typically occurs as a subproblem of the Generalized Tree Alignment Problem, which looks for the tree with the lowest alignment cost among all possible trees. This is equivalent to the Maximum Parsimony problem when the input sequences are not aligned, that is, when phylogeny and alignments are simultaneously inferred.

RESULTS

For large data sets, a popular heuristic is Direct Optimization (DO). DO provides a good tradeoff between speed, scalability, and competitive scores, and is implemented in the computer program POY. All other (competitive) algorithms have greater time complexities compared to DO. Here, we introduce and present experiments a new algorithm Affine-DO to accommodate the indel (alignment gap) models commonly used in phylogenetic analysis of molecular sequence data. Affine-DO has the same time complexity as DO, but is correctly suited for the affine gap edit distance. We demonstrate its performance with more than 330,000 experimental tests. These experiments show that the solutions of Affine-DO are close to the lower bound inferred from a linear programming solution. Moreover, iterating over a solution produced using Affine-DO shows little improvement.

CONCLUSIONS

Our results show that Affine-DO is likely producing near-optimal solutions, with approximations within 10% for sequences with small divergence, and within 30% for random sequences, for which Affine-DO produced the worst solutions. The Affine-DO algorithm has the necessary scalability and optimality to be a significant improvement in the real-world phylogenetic analysis of sequence data.

摘要

背景

推断 DNA 序列之间的同源性,即多个基因组中共享共同进化起源的位置,是生物学家面临的一项关键但困难的任务。其计算对应物被称为多序列比对问题。有各种标准和方法可用于进行多序列比对,其中,在系统发育树上最小化比对的总代价被称为树比对问题。在组合优化中,这个问题通常作为广义树比对问题的子问题出现,该问题在所有可能的树中寻找具有最低比对代价的树。当输入序列未对齐时,即同时推断系统发育和比对时,这相当于最大简约问题。

结果

对于大型数据集,一种流行的启发式方法是直接优化(DO)。DO 在速度、可扩展性和竞争分数之间提供了良好的折衷,并且在计算机程序 POY 中实现。与 DO 相比,所有其他(竞争)算法的时间复杂度都更高。在这里,我们引入并展示了一种新算法——仿射-DO,以适应分子序列数据系统发育分析中常用的插入缺失(比对间隙)模型。仿射-DO 具有与 DO 相同的时间复杂度,但非常适合仿射间隙编辑距离。我们通过超过 330000 次实验测试展示了它的性能。这些实验表明,仿射-DO 的解决方案接近从线性规划解决方案推断出的下限。此外,在使用仿射-DO 生成的解决方案上迭代几乎没有改进。

结论

我们的结果表明,仿射-DO 很可能产生接近最优的解决方案,对于具有小分歧的序列,其近似值在 10%以内,对于随机序列,其近似值在 30%以内,对于仿射-DO 生成的最差解决方案,其近似值在 30%以内。仿射-DO 算法具有必要的可扩展性和最优性,可以显著改进序列数据的实际系统发育分析。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f63/3605350/c04e5d458406/1471-2105-13-293-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f63/3605350/b673c7fe4e34/1471-2105-13-293-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f63/3605350/4bf01a30a4ce/1471-2105-13-293-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f63/3605350/7c5677803297/1471-2105-13-293-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f63/3605350/99f4aba5143f/1471-2105-13-293-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f63/3605350/0ac0c9657dcf/1471-2105-13-293-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f63/3605350/0415afe623b6/1471-2105-13-293-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f63/3605350/47a4f4679c11/1471-2105-13-293-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f63/3605350/e9514392b2c3/1471-2105-13-293-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f63/3605350/c04e5d458406/1471-2105-13-293-9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f63/3605350/b673c7fe4e34/1471-2105-13-293-1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f63/3605350/4bf01a30a4ce/1471-2105-13-293-2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f63/3605350/7c5677803297/1471-2105-13-293-3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f63/3605350/99f4aba5143f/1471-2105-13-293-4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f63/3605350/0ac0c9657dcf/1471-2105-13-293-5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f63/3605350/0415afe623b6/1471-2105-13-293-6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f63/3605350/47a4f4679c11/1471-2105-13-293-7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f63/3605350/e9514392b2c3/1471-2105-13-293-8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2f63/3605350/c04e5d458406/1471-2105-13-293-9.jpg

相似文献

1
The tree alignment problem.树对齐问题。
BMC Bioinformatics. 2012 Nov 9;13:293. doi: 10.1186/1471-2105-13-293.
2
Treelength optimization for phylogeny estimation.树长优化在系统发育估计中的应用。
PLoS One. 2012;7(3):e33104. doi: 10.1371/journal.pone.0033104. Epub 2012 Mar 19.
3
SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees.SATe-II:一种非常快速且准确的同时估计多个序列比对和系统发育树的方法。
Syst Biol. 2012 Jan;61(1):90-106. doi: 10.1093/sysbio/syr095. Epub 2011 Dec 1.
4
Ancestral sequence alignment under optimal conditions.在最佳条件下进行祖先序列比对。
BMC Bioinformatics. 2005 Nov 17;6:273. doi: 10.1186/1471-2105-6-273.
5
Barking up the wrong treelength: the impact of gap penalty on alignment and tree accuracy.错误的树长探寻:空位罚分对序列比对和树准确性的影响
IEEE/ACM Trans Comput Biol Bioinform. 2009 Jan-Mar;6(1):7-21. doi: 10.1109/TCBB.2008.63.
6
Bayesian coestimation of phylogeny and sequence alignment.系统发育与序列比对的贝叶斯联合估计
BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83.
7
Molecular systematics of terraranas (Anura: Brachycephaloidea) with an assessment of the effects of alignment and optimality criteria.陆栖蛙类(无尾目:短头蟾超科)的分子系统学及比对和最优性标准的影响评估
Zootaxa. 2014 Jun 26;3825:1-132. doi: 10.11646/zootaxa.3825.1.1.
8
Simultaneous phylogeny reconstruction and multiple sequence alignment.同时进行系统发育重建和多序列比对。
BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S11. doi: 10.1186/1471-2105-10-S1-S11.
9
TreeRefiner: a tool for refining a multiple alignment on a phylogenetic tree.TreeRefiner:一种用于在系统发育树上优化多重比对的工具。
Proc IEEE Comput Syst Bioinform Conf. 2005:111-9. doi: 10.1109/csb.2005.53.
10
Logarithmic gap costs decrease alignment accuracy.对数空位罚分降低比对准确性。
BMC Bioinformatics. 2006 Dec 5;7:527. doi: 10.1186/1471-2105-7-527.

引用本文的文献

1
Efficient implied alignment.高效的隐含对齐。
BMC Bioinformatics. 2020 Jul 9;21(1):296. doi: 10.1186/s12859-020-03595-2.
2
Testing for universal common ancestry.检测普遍的共同祖先。
Syst Biol. 2014 Sep;63(5):838-42. doi: 10.1093/sysbio/syu041. Epub 2014 Jun 23.
3
Local search for the generalized tree alignment problem.广义树比对问题的局部搜索。

本文引用的文献

1
Fixed Character States and the Optimization of Molecular Sequence Data.固定性状状态与分子序列数据的优化
Cladistics. 1999 Dec;15(4):379-385. doi: 10.1111/j.1096-0031.1999.tb00274.x.
2
Application note: on extension gap in POY version 3.
Cladistics. 2008 Dec;24(6):1070. doi: 10.1111/j.1096-0031.2008.00208.x.
3
Dynamic homology and the likelihood criterion.动态同源性与似然性标准。
Cladistics. 2006 Apr;22(2):157-170. doi: 10.1111/j.1096-0031.2006.00096.x.
BMC Bioinformatics. 2013 Feb 26;14:66. doi: 10.1186/1471-2105-14-66.
4
POY version 4: phylogenetic analysis using dynamic homologies.POY版本4:使用动态同源性的系统发育分析。
Cladistics. 2010 Feb;26(1):72-85. doi: 10.1111/j.1096-0031.2009.00282.x. Epub 2009 Sep 24.
5
Simultaneous phylogeny reconstruction and multiple sequence alignment.同时进行系统发育重建和多序列比对。
BMC Bioinformatics. 2009 Jan 30;10 Suppl 1(Suppl 1):S11. doi: 10.1186/1471-2105-10-S1-S11.
6
Barking up the wrong treelength: the impact of gap penalty on alignment and tree accuracy.错误的树长探寻:空位罚分对序列比对和树准确性的影响
IEEE/ACM Trans Comput Biol Bioinform. 2009 Jan-Mar;6(1):7-21. doi: 10.1109/TCBB.2008.63.
7
Phylogeny estimation and alignment via POY versus Clustal + PAUP*: a response to Ogden and Rosenberg (2007).通过POY与Clustal + PAUP*进行系统发育估计和序列比对:对奥格登和罗森伯格(2007年)的回应
Syst Biol. 2008 Aug;57(4):653-7. doi: 10.1080/10635150802302476.
8
The effect of the guide tree on multiple sequence alignments and subsequent phylogenetic analyses.引导树对多序列比对及后续系统发育分析的影响。
Pac Symp Biocomput. 2008:25-36. doi: 10.1142/9789812776136_0004.
9
Alignment and topological accuracy of the direct optimization approach via POY and traditional phylogenetics via ClustalW + PAUP*.通过POY的直接优化方法与通过ClustalW + PAUP*的传统系统发育学方法的比对及拓扑准确性。
Syst Biol. 2007 Apr;56(2):182-93. doi: 10.1080/10635150701281102.
10
Logarithmic gap costs decrease alignment accuracy.对数空位罚分降低比对准确性。
BMC Bioinformatics. 2006 Dec 5;7:527. doi: 10.1186/1471-2105-7-527.