PSAR：通过概率抽样测量多重序列比对可靠性。

PSAR: measuring multiple sequence alignment reliability by probabilistic sampling.

机构信息

Institute for Genomic Biology, University of Illinois at Urbana-Champaign, Urbana, IL 61801, USA.

出版信息

Nucleic Acids Res. 2011 Aug;39(15):6359-68. doi: 10.1093/nar/gkr334. Epub 2011 May 16.

DOI:10.1093/nar/gkr334

PMID:21576232

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3159474/

Abstract

Multiple sequence alignment, which is of fundamental importance for comparative genomics, is a difficult problem and error-prone. Therefore, it is essential to measure the reliability of the alignments and incorporate it into downstream analyses. We propose a new probabilistic sampling-based alignment reliability (PSAR) score. Instead of relying on heuristic assumptions, such as the correlation between alignment quality and guide tree uncertainty in progressive alignment methods, we directly generate suboptimal alignments from an input multiple sequence alignment by a probabilistic sampling method, and compute the agreement of the input alignment with the suboptimal alignments as the alignment reliability score. We construct the suboptimal alignments by an approximate method that is based on pairwise comparisons between each single sequence and the sub-alignment of the input alignment where the chosen sequence is left out. By using simulation-based benchmarks, we find that our approach is superior to existing ones, supporting that the suboptimal alignments are highly informative source for assessing alignment reliability. We apply the PSAR method to the alignments in the UCSC Genome Browser to measure the reliability of alignments in different types of regions, such as coding exons and conserved non-coding regions, and use it to guide cross-species conservation study.

摘要

多序列比对对于比较基因组学至关重要，但它是一个困难且容易出错的问题。因此，衡量比对的可靠性并将其纳入下游分析至关重要。我们提出了一种新的基于概率抽样的比对可靠性（PSAR）评分方法。我们不是依赖启发式假设，例如渐进比对方法中比对质量与引导树不确定性之间的相关性，而是通过概率抽样方法直接从输入的多序列比对中生成次优比对，并计算输入比对与次优比对的一致性作为比对可靠性评分。我们通过一种近似方法构建次优比对，该方法基于每个单序列与输入比对中被选中序列排除的子比对之间的两两比较。通过基于模拟的基准测试，我们发现我们的方法优于现有方法，支持次优比对是评估比对可靠性的高度信息来源。我们将 PSAR 方法应用于 UCSC 基因组浏览器中的比对，以衡量不同类型区域（如编码外显子和保守非编码区域）中比对的可靠性，并将其用于指导跨物种保守性研究。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8fbe/3159474/5c55f900fe04/gkr334f1.jpg

相似文献

PSAR: measuring multiple sequence alignment reliability by probabilistic sampling.PSAR：通过概率抽样测量多重序列比对可靠性。

Nucleic Acids Res. 2011 Aug;39(15):6359-68. doi: 10.1093/nar/gkr334. Epub 2011 May 16.

PSAR-align: improving multiple sequence alignment using probabilistic sampling.PSAR-align：使用概率抽样改进多重序列比对。

Bioinformatics. 2014 Apr 1;30(7):1010-2. doi: 10.1093/bioinformatics/btt636. Epub 2013 Nov 12.

Uncertainty in homology inferences: assessing and improving genomic sequence alignment.同源性推断中的不确定性：评估和改进基因组序列比对

Genome Res. 2008 Feb;18(2):298-309. doi: 10.1101/gr.6725608. Epub 2007 Dec 11.

Towards realistic benchmarks for multiple alignments of non-coding sequences.针对非编码序列多重比对的现实基准。

BMC Bioinformatics. 2010 Jan 26;11:54. doi: 10.1186/1471-2105-11-54.

PhyLAT: a phylogenetic local alignment tool.PhyLAT：一种系统发生的局部比对工具。

Bioinformatics. 2012 May 15;28(10):1336-44. doi: 10.1093/bioinformatics/bts158. Epub 2012 Apr 6.

Stochastic pairwise alignments.随机成对比对

Bioinformatics. 2002;18 Suppl 2:S153-60. doi: 10.1093/bioinformatics/18.suppl_2.s153.

SATe-II: very fast and accurate simultaneous estimation of multiple sequence alignments and phylogenetic trees.SATe-II：一种非常快速且准确的同时估计多个序列比对和系统发育树的方法。

Syst Biol. 2012 Jan;61(1):90-106. doi: 10.1093/sysbio/syr095. Epub 2011 Dec 1.

SeqTools: visual tools for manual analysis of sequence alignments.SeqTools：用于手动分析序列比对的可视化工具。

BMC Res Notes. 2016 Jan 22;9:39. doi: 10.1186/s13104-016-1847-3.

Sigma-2: Multiple sequence alignment of non-coding DNA via an evolutionary model.Sigma-2：基于进化模型的非编码 DNA 多重序列比对。

BMC Bioinformatics. 2010 Sep 16;11:464. doi: 10.1186/1471-2105-11-464.

Accounting for alignment uncertainty in phylogenomics.系统发生基因组学中的排列不确定性校正。

PLoS One. 2012;7(1):e30288. doi: 10.1371/journal.pone.0030288. Epub 2012 Jan 17.

引用本文的文献

Build a better bootstrap and the RAWR shall beat a random path to your door: phylogenetic support estimation revisited.构建更好的引导程序，RAWR 将随机找到通往你家门的路：重新审视系统发育支持估计。

Bioinformatics. 2021 Jul 12;37(Suppl_1):i111-i119. doi: 10.1093/bioinformatics/btab263.

Non-parametric and semi-parametric support estimation using SEquential RESampling random walks on biomolecular sequences.使用生物分子序列上的顺序重采样随机游走进行非参数和半参数支持估计。

Algorithms Mol Biol. 2020 Apr 16;15:7. doi: 10.1186/s13015-020-00167-0. eCollection 2020.

Unrealistic phylogenetic trees may improve phylogenetic footprinting.不切实际的系统发育树可能会改善系统发育足迹分析。

Bioinformatics. 2017 Jun 1;33(11):1639-1646. doi: 10.1093/bioinformatics/btx033.

Characterization of multiple sequence alignment errors using complete-likelihood score and position-shift map.使用完全似然得分和位置偏移图对多序列比对错误进行表征。

BMC Bioinformatics. 2016 Mar 18;17:133. doi: 10.1186/s12859-016-0945-5.

Computational approaches to study the effects of small genomic variations.研究小基因组变异影响的计算方法。

J Mol Model. 2015 Oct;21(10):251. doi: 10.1007/s00894-015-2794-y. Epub 2015 Sep 8.

YOC, A new strategy for pairwise alignment of collinear genomes.YOC，一种用于共线基因组两两比对的新策略。

BMC Bioinformatics. 2015 Apr 2;16(1):111. doi: 10.1186/s12859-015-0530-3.

GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters.指南2：考虑多个参数的不确定性，准确检测不可靠的比对区域。

Nucleic Acids Res. 2015 Jul 1;43(W1):W7-14. doi: 10.1093/nar/gkv318. Epub 2015 Apr 16.

IVisTMSA: Interactive Visual Tools for Multiple Sequence Alignments.IVisTMSA：用于多序列比对的交互式可视化工具。

Evol Bioinform Online. 2015 Mar 12;11:35-42. doi: 10.4137/EBO.S18980. eCollection 2015.

TCS: a web server for multiple sequence alignment evaluation and phylogenetic reconstruction.TCS：一个用于多序列比对评估和系统发育重建的网络服务器。

Nucleic Acids Res. 2015 Jul 1;43(W1):W3-6. doi: 10.1093/nar/gkv310. Epub 2015 Apr 8.

The Genome 10K Project: a way forward.基因组 10K 项目：前进之路。

Annu Rev Anim Biosci. 2015;3:57-111. doi: 10.1146/annurev-animal-090414-014900.

本文引用的文献

No so HoT - heads or tails is not able to reliably compare multiple sequence alignments.No so HoT - 正面或反面无法可靠地比较多个序列比对。

Cladistics. 2010 Aug;26(4):438-443. doi: 10.1111/j.1096-0031.2009.00292.x. Epub 2009 Nov 11.

GUIDANCE: a web server for assessing alignment confidence scores.GUIDANCE：一个评估比对置信分数的网络服务器。

Nucleic Acids Res. 2010 Jul;38(Web Server issue):W23-8. doi: 10.1093/nar/gkq443. Epub 2010 May 23.

The effect of insertions, deletions, and alignment errors on the branch-site test of positive selection.插入、缺失和比对错误对正选择分支位点检验的影响。

Mol Biol Evol. 2010 Oct;27(10):2257-67. doi: 10.1093/molbev/msq115. Epub 2010 May 5.

An alignment confidence score capturing robustness to guide tree uncertainty.一种对齐置信度评分，可捕捉对引导树不确定性的稳健性。

Mol Biol Evol. 2010 Aug;27(8):1759-67. doi: 10.1093/molbev/msq066. Epub 2010 Mar 5.

Towards realistic benchmarks for multiple alignments of non-coding sequences.针对非编码序列多重比对的现实基准。

BMC Bioinformatics. 2010 Jan 26;11:54. doi: 10.1186/1471-2105-11-54.

Detection of nonneutral substitution rates on mammalian phylogenies.检测哺乳动物系统发育上的非中性替代率。

Genome Res. 2010 Jan;20(1):110-21. doi: 10.1101/gr.097857.109. Epub 2009 Oct 26.

BigFoot: Bayesian alignment and phylogenetic footprinting with MCMC.大脚怪：使用马尔可夫链蒙特卡罗方法进行贝叶斯比对和系统发育足迹分析。

BMC Evol Biol. 2009 Aug 28;9:217. doi: 10.1186/1471-2148-9-217.

Ancestral population genomics: the coalescent hidden Markov model approach.祖先群体基因组学：合并隐马尔可夫模型方法。

Genetics. 2009 Sep;183(1):259-74. doi: 10.1534/genetics.109.103010. Epub 2009 Jul 6.

Multiple alignment of DNA sequences with MAFFT.使用MAFFT对DNA序列进行多重比对。

Methods Mol Biol. 2009;537:39-64. doi: 10.1007/978-1-59745-251-9_3.

Sequence progressive alignment, a framework for practical large-scale probabilistic consistency alignment.序列渐进比对，一种用于实际大规模概率一致性比对的框架。

Bioinformatics. 2009 Feb 1;25(3):295-301. doi: 10.1093/bioinformatics/btn630. Epub 2008 Dec 4.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

PSAR：通过概率抽样测量多重序列比对可靠性。

PSAR: measuring multiple sequence alignment reliability by probabilistic sampling.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献