基于模型的序列比对质量预测。

Model-based prediction of sequence alignment quality.

作者信息

Ahola Virpi, Aittokallio Tero, Vihinen Mauno, Uusipaikka Esa

机构信息

Biotechnology and Food Research, MTT Agrifood Research Finland, FI-31600 Jokioinen, Finland.

出版信息

Bioinformatics. 2008 Oct 1;24(19):2165-71. doi: 10.1093/bioinformatics/btn414. Epub 2008 Aug 4.

DOI:10.1093/bioinformatics/btn414

PMID:18678587

Abstract

MOTIVATION

Multiple sequence alignment (MSA) is an essential prerequisite for many sequence analysis methods and valuable tool itself for describing relationships between protein sequences. Since the success of the sequence analysis is highly dependent on the reliability of alignments, measures for assessing the quality of alignments are highly requisite.

RESULTS

We present a statistical model-based alignment quality score. Unlike other quality scores, it does not require several parallel alignments for the same set of sequences or additional structural information. Our quality score is based on measuring the conservation level of reference alignments in Homstrad. Reference sequences were realigned with the Mafft, Muscle and Probcons alignment programs, and a sum-of-pairs (SP) score was used to measure the quality of the realignments. Statistical modelling of the SP score as a function of conservation level and other alignment characteristics makes it possible to predict the SP score for any global MSA. The predicted SP scores are highly correlated with the correct SP scores, when tested on the Homstrad and SABmark databases. The results are comparable to that of multiple overlap score (MOS) and better than those of normalized mean distance (NorMD) and normalized iRMSD (NiRMSD) alignment quality criteria. Furthermore, the predicted SP score is able to detect alignments with badly aligned or unrelated sequences.

AVAILABILITY

The method is freely available at http://www.mtt.fi/AlignmentQuality/.

摘要

动机

多序列比对（MSA）是许多序列分析方法的重要前提，其本身也是描述蛋白质序列间关系的重要工具。由于序列分析的成功高度依赖于比对的可靠性，因此评估比对质量的方法非常必要。

结果

我们提出了一种基于统计模型的比对质量得分。与其他质量得分不同，它不需要对同一组序列进行多个并行比对或额外的结构信息。我们的质量得分基于测量Homstrad中参考比对的保守水平。参考序列使用Mafft、Muscle和Probcons比对程序重新进行比对，并使用双序列比对和（SP）得分来衡量重新比对的质量。将SP得分作为保守水平和其他比对特征的函数进行统计建模，使得能够预测任何全局多序列比对的SP得分。在Homstrad和SABmark数据库上进行测试时，预测的SP得分与正确的SP得分高度相关。结果与多重重叠得分（MOS）相当，且优于归一化平均距离（NorMD）和归一化iRMSD（NiRMSD）比对质量标准。此外，预测的SP得分能够检测出比对不佳或序列不相关的比对。

可用性

该方法可在http://www.mtt.fi/AlignmentQuality/免费获取。

相似文献

Model-based prediction of sequence alignment quality.

Bioinformatics. 2008 Oct 1;24(19):2165-71. doi: 10.1093/bioinformatics/btn414. Epub 2008 Aug 4.

Fast model-based protein homology detection without alignment.

Bioinformatics. 2007 Jul 15;23(14):1728-36. doi: 10.1093/bioinformatics/btm247. Epub 2007 May 8.

Probalign: multiple sequence alignment using partition function posterior probabilities.

Bioinformatics. 2006 Nov 15;22(22):2715-21. doi: 10.1093/bioinformatics/btl472. Epub 2006 Sep 5.

Computing the P-value of the information content from an alignment of multiple sequences.

Bioinformatics. 2005 Jun;21 Suppl 1:i311-8. doi: 10.1093/bioinformatics/bti1044.

A Shannon entropy-based filter detects high- quality profile-profile alignments in searches for remote homologues.

Proteins. 2004 Feb 1;54(2):351-60. doi: 10.1002/prot.10564.

Improvement in accuracy of multiple sequence alignment using novel group-to-group sequence alignment algorithm with piecewise linear gap cost.

BMC Bioinformatics. 2006 Dec 1;7:524. doi: 10.1186/1471-2105-7-524.

The influence of gapped positions in multiple sequence alignments on secondary structure prediction methods.

Comput Biol Chem. 2004 Dec;28(5-6):351-66. doi: 10.1016/j.compbiolchem.2004.09.005.

MSAProbs: multiple sequence alignment based on pair hidden Markov models and partition function posterior probabilities.

Bioinformatics. 2010 Aug 15;26(16):1958-64. doi: 10.1093/bioinformatics/btq338. Epub 2010 Jun 23.

SPEM: improving multiple sequence alignment with sequence profiles and predicted secondary structures.

Bioinformatics. 2005 Sep 15;21(18):3615-21. doi: 10.1093/bioinformatics/bti582. Epub 2005 Jul 14.

PRALINETM: a strategy for improved multiple alignment of transmembrane proteins.

Bioinformatics. 2008 Feb 15;24(4):492-7. doi: 10.1093/bioinformatics/btm636. Epub 2008 Jan 2.

引用本文的文献

Complex Evolutionary History of the Mammalian Histone H1.1-H1.5 Gene Family.

Mol Biol Evol. 2017 Mar 1;34(3):545-558. doi: 10.1093/molbev/msw241.

Identification of cis-suppression of human disease mutations by comparative genomics.

Nature. 2015 Aug 13;524(7564):225-9. doi: 10.1038/nature14497. Epub 2015 Jun 29.

Efficient representation of uncertainty in multiple sequence alignments using directed acyclic graphs.

BMC Bioinformatics. 2015 Apr 1;16:108. doi: 10.1186/s12859-015-0516-1.

On the necessity of dissecting sequence similarity scores into segment-specific contributions for inferring protein homology, function prediction and annotation.

BMC Bioinformatics. 2014 Jun 2;15:166. doi: 10.1186/1471-2105-15-166.

Accuracy estimation and parameter advising for protein multiple sequence alignment.

J Comput Biol. 2013 Apr;20(4):259-79. doi: 10.1089/cmb.2013.0007. Epub 2013 Mar 14.

Estimation of bacterial diversity using next generation sequencing of 16S rDNA: a comparison of different workflows.

BMC Bioinformatics. 2011 Dec 14;12:473. doi: 10.1186/1471-2105-12-473.

Phylogenetic relationships within the Opisthokonta based on phylogenomic analyses of conserved single-copy protein domains.

Mol Biol Evol. 2012 Feb;29(2):531-44. doi: 10.1093/molbev/msr185. Epub 2011 Jul 18.

GIGA: a simple, efficient algorithm for gene tree inference in the genomic age.

BMC Bioinformatics. 2010 Jun 9;11:312. doi: 10.1186/1471-2105-11-312.

Planning the human variome project: the Spain report.

Hum Mutat. 2009 Apr;30(4):496-510. doi: 10.1002/humu.20972.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

基于模型的序列比对质量预测。

Model-based prediction of sequence alignment quality.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY

动机

结果

可用性

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献