一种对齐置信度评分，可捕捉对引导树不确定性的稳健性。

An alignment confidence score capturing robustness to guide tree uncertainty.

机构信息

Department of Cell Research and Immunology, George S. Wise Faculty of Life Sciences, Tel Aviv University, Tel Aviv, Israel.

出版信息

Mol Biol Evol. 2010 Aug;27(8):1759-67. doi: 10.1093/molbev/msq066. Epub 2010 Mar 5.

DOI:10.1093/molbev/msq066

PMID:20207713

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC2908709/

Abstract

Multiple sequence alignment (MSA) is the basis for a wide range of comparative sequence analyses from molecular phylogenetics to 3D structure prediction. Sophisticated algorithms have been developed for sequence alignment, but in practice, many errors can be expected and extensive portions of the MSA are unreliable. Hence, it is imperative to understand and characterize the various sources of errors in MSAs and to quantify site-specific alignment confidence. In this paper, we show that uncertainties in the guide tree used by progressive alignment methods are a major source of alignment uncertainty. We use this insight to develop a novel method for quantifying the robustness of each alignment column to guide tree uncertainty. We build on the widely used bootstrap method for perturbing the phylogenetic tree. Specifically, we generate a collection of trees and use each as a guide tree in the alignment algorithm, thus producing a set of MSAs. We next test the consistency of every column of the MSA obtained from the unperturbed guide tree with respect to the set of MSAs. We name this measure the "GUIDe tree based AligNment ConfidencE" (GUIDANCE) score. Using the Benchmark Alignment data BASE benchmark as well as simulation studies, we show that GUIDANCE scores accurately identify errors in MSAs. Additionally, we compare our results with the previously published Heads-or-Tails score and show that the GUIDANCE score is a better predictor of unreliably aligned regions.

摘要

多序列比对 (MSA) 是从分子系统发育学到 3D 结构预测等各种比较序列分析的基础。已经开发出了用于序列比对的复杂算法，但实际上，可能会出现许多错误，并且 MSA 的很大一部分是不可靠的。因此，理解和描述 MSA 中各种错误源并量化特定位置的比对置信度至关重要。在本文中，我们表明渐进比对方法中使用的引导树的不确定性是比对不确定性的主要来源。我们利用这一见解开发了一种新方法来量化每个比对列对引导树不确定性的稳健性。我们基于广泛使用的对系统发育树进行扰动的自举方法。具体来说，我们生成一组树，并将每个树用作比对算法中的引导树，从而生成一组 MSA。接下来，我们测试从未扰动引导树获得的 MSA 中每列相对于 MSA 集合的一致性。我们将此度量命名为“基于 GUIDE 树的对齐置信度”(GUIDANCE)评分。使用基准比对数据 BASE 基准以及模拟研究，我们表明 GUIDANCE 评分可以准确识别 MSA 中的错误。此外，我们将结果与之前发表的 Heads-or-Tails 评分进行比较，并表明 GUIDANCE 评分是不可靠比对区域的更好预测指标。

相似文献

An alignment confidence score capturing robustness to guide tree uncertainty.一种对齐置信度评分，可捕捉对引导树不确定性的稳健性。

Mol Biol Evol. 2010 Aug;27(8):1759-67. doi: 10.1093/molbev/msq066. Epub 2010 Mar 5.

The effect of the guide tree on multiple sequence alignments and subsequent phylogenetic analyses.引导树对多序列比对及后续系统发育分析的影响。

Pac Symp Biocomput. 2008:25-36. doi: 10.1142/9789812776136_0004.

GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters.指南2：考虑多个参数的不确定性，准确检测不可靠的比对区域。

Nucleic Acids Res. 2015 Jul 1;43(W1):W7-14. doi: 10.1093/nar/gkv318. Epub 2015 Apr 16.

GUIDANCE: a web server for assessing alignment confidence scores.GUIDANCE：一个评估比对置信分数的网络服务器。

Nucleic Acids Res. 2010 Jul;38(Web Server issue):W23-8. doi: 10.1093/nar/gkq443. Epub 2010 May 23.

Evidence of Statistical Inconsistency of Phylogenetic Methods in the Presence of Multiple Sequence Alignment Uncertainty.在存在多序列比对不确定性的情况下系统发育方法统计不一致性的证据。

Genome Biol Evol. 2015 Jul 1;7(8):2102-16. doi: 10.1093/gbe/evv127.

TCS: a new multiple sequence alignment reliability measure to estimate alignment accuracy and improve phylogenetic tree reconstruction.TCS：一种新的多重序列比对可靠性度量方法，用于估计比对准确性并改进系统发育树重建。

Mol Biol Evol. 2014 Jun;31(6):1625-37. doi: 10.1093/molbev/msu117. Epub 2014 Apr 1.

PnpProbs: a better multiple sequence alignment tool by better handling of guide trees.PnpProbs：通过更好地处理引导树而成为更好的多序列比对工具。

BMC Bioinformatics. 2016 Aug 31;17 Suppl 8(Suppl 8):285. doi: 10.1186/s12859-016-1121-7.

Class of multiple sequence alignment algorithm affects genomic analysis.多序列比对算法的类别会影响基因组分析。

Mol Biol Evol. 2013 Mar;30(3):642-53. doi: 10.1093/molbev/mss256. Epub 2012 Nov 9.

Multiple Sequence Alignment Averaging Improves Phylogeny Reconstruction.多序列比对平均法提高系统发育重建。

Syst Biol. 2019 Jan 1;68(1):117-130. doi: 10.1093/sysbio/syy036.

Bayesian coestimation of phylogeny and sequence alignment.系统发育与序列比对的贝叶斯联合估计

BMC Bioinformatics. 2005 Apr 1;6:83. doi: 10.1186/1471-2105-6-83.

引用本文的文献

Investigating the psammophilic karyorelictean ciliate families Kentrophoridae and Cryptopharyngidae (Protista, Ciliophora): molecular phylogeny, geographic distributions and a brief revision including descriptions of a new genus, a new species and a new combination.研究嗜沙核残迹类纤毛虫科肯氏科和隐咽科（原生生物界，纤毛门）：分子系统发育、地理分布及简要修订，包括一个新属、一个新物种和一个新组合的描述

Mar Life Sci Technol. 2024 Dec 23;7(1):23-49. doi: 10.1007/s42995-024-00266-6. eCollection 2025 Feb.

Insertions and Deletions: Computational Methods, Evolutionary Dynamics, and Biological Applications.插入和缺失：计算方法、进化动态和生物应用。

Mol Biol Evol. 2024 Sep 4;41(9). doi: 10.1093/molbev/msae177.

Comparative Evolutionary Genomics in Insects.昆虫比较进化基因组学。

Methods Mol Biol. 2024;2802:473-514. doi: 10.1007/978-1-0716-3838-5_16.

A Guide to Phylogenomic Inference.系统发育基因组推断指南。

Methods Mol Biol. 2024;2802:267-345. doi: 10.1007/978-1-0716-3838-5_11.

Insertion-Deletion Events Are Depleted in Protein Regions with Predicted Secondary Structure.插入-缺失事件在具有预测二级结构的蛋白质区域中减少。

Genome Biol Evol. 2024 May 2;16(5). doi: 10.1093/gbe/evae093.

Effect of tokenization on transformers for biological sequences.词元化对生物序列变压器模型的影响。

Bioinformatics. 2024 Mar 29;40(4). doi: 10.1093/bioinformatics/btae196.

Live-bearing cockroach genome reveals convergent evolutionary mechanisms linked to viviparity in insects and beyond.胎生蟑螂基因组揭示了与昆虫及其他生物胎生相关的趋同进化机制。

iScience. 2023 Sep 9;26(10):107832. doi: 10.1016/j.isci.2023.107832. eCollection 2023 Oct 20.

Comparative genome analysis of three euplotid protists provides insights into the evolution of nanochromosomes in unicellular eukaryotic organisms.三种真核游仆虫原生生物的比较基因组分析为单细胞真核生物中纳米染色体的进化提供了见解。

Mar Life Sci Technol. 2023 May 28;5(3):300-315. doi: 10.1007/s42995-023-00175-0. eCollection 2023 Aug.

Sensitive inference of alignment-safe intervals from biodiverse protein sequence clusters using EMERALD.使用 EMERALD 从多样化的蛋白质序列簇中进行对齐安全区间的敏感推断。

Genome Biol. 2023 Jul 17;24(1):168. doi: 10.1186/s13059-023-03008-6.

Taxonomic and Phylogenetic Studies of Two Brackish Species (Protista, Ciliophora, Scuticociliatia) from Subtropical Coastal Waters of China, with Report of a New Species.中国亚热带沿海水域两种咸淡水物种（原生生物界，纤毛虫门，盾纤目）的分类学和系统发育研究，并报道一个新物种

Microorganisms. 2023 May 27;11(6):1422. doi: 10.3390/microorganisms11061422.

本文引用的文献

CONFIDENCE LIMITS ON PHYLOGENIES: AN APPROACH USING THE BOOTSTRAP.系统发育树的置信区间：一种使用自展法的方法。

Evolution. 1985 Jul;39(4):783-791. doi: 10.1111/j.1558-5646.1985.tb00420.x.

Fast statistical alignment.快速统计对齐

PLoS Comput Biol. 2009 May;5(5):e1000392. doi: 10.1371/journal.pcbi.1000392. Epub 2009 May 29.

INDELible: a flexible simulator of biological sequence evolution.INDELible：一款灵活的生物序列进化模拟器。

Mol Biol Evol. 2009 Aug;26(8):1879-88. doi: 10.1093/molbev/msp098. Epub 2009 May 7.

Characterization of pairwise and multiple sequence alignment errors.成对和多序列比对错误的特征描述。

Gene. 2009 Jul 15;441(1-2):141-7. doi: 10.1016/j.gene.2008.05.016. Epub 2008 Jun 3.

Phylogeny-aware gap placement prevents errors in sequence alignment and evolutionary analysis.系统发育感知缺口放置可防止序列比对和进化分析中的错误。

Science. 2008 Jun 20;320(5883):1632-5. doi: 10.1126/science.1158395.

The effect of the guide tree on multiple sequence alignments and subsequent phylogenetic analyses.引导树对多序列比对及后续系统发育分析的影响。

Pac Symp Biocomput. 2008:25-36. doi: 10.1142/9789812776136_0004.

Local reliability measures from sets of co-optimal multiple sequence alignments.来自一组共同最优多序列比对的局部可靠性度量。

Pac Symp Biocomput. 2008:15-24.

Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments.从蛋白质序列比对中去除分歧和比对不明确的区域后系统发育树的改进。

Syst Biol. 2007 Aug;56(4):564-77. doi: 10.1080/10635150701472164.

Heads or tails: a simple reliability check for multiple sequence alignments.正面还是反面：多重序列比对的一种简单可靠性检验。

Mol Biol Evol. 2007 Jun;24(6):1380-3. doi: 10.1093/molbev/msm060. Epub 2007 Mar 25.

The accuracy of several multiple sequence alignment programs for proteins.几种蛋白质多序列比对程序的准确性。

BMC Bioinformatics. 2006 Oct 24;7:471. doi: 10.1186/1471-2105-7-471.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验