Suppr超能文献

蛋白质图谱-图谱比较替代评分评估。

An assessment of substitution scores for protein profile-profile comparison.

机构信息

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.

出版信息

Bioinformatics. 2011 Dec 15;27(24):3356-63. doi: 10.1093/bioinformatics/btr565. Epub 2011 Oct 13.

Abstract

MOTIVATION

Pairwise protein sequence alignments are generally evaluated using scores defined as the sum of substitution scores for aligning amino acids to one another, and gap scores for aligning runs of amino acids in one sequence to null characters inserted into the other. Protein profiles may be abstracted from multiple alignments of protein sequences, and substitution and gap scores have been generalized to the alignment of such profiles either to single sequences or to other profiles. Although there is widespread agreement on the general form substitution scores should take for profile-sequence alignment, little consensus has been reached on how best to construct profile-profile substitution scores, and a large number of these scoring systems have been proposed. Here, we assess a variety of such substitution scores. For this evaluation, given a gold standard set of multiple alignments, we calculate the probability that a profile column yields a higher substitution score when aligned to a related than to an unrelated column. We also generalize this measure to sets of two or three adjacent columns. This simple approach has the advantages that it does not depend primarily upon the gold-standard alignment columns with the weakest empirical support, and that it does not need to fit gap and offset costs for use with each substitution score studied.

RESULTS

A simple symmetrization of mean profile-sequence scores usually performed the best. These were followed closely by several specific scoring systems constructed using a variety of rationales.

CONTACT

altschul@ncbi.nlm.nih.gov

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

通常使用定义为将氨基酸相互对齐的替换得分之和以及将一个序列中的氨基酸连续序列对齐到插入另一个序列的空字符的对齐得分来评估成对蛋白质序列比对。蛋白质图谱可以从蛋白质序列的多重比对中抽象出来,并且替换和间隙得分已经被推广到这种图谱与单个序列或其他图谱的对齐。尽管对于序列比对的轮廓替换得分应采取的一般形式存在广泛的共识,但对于如何最佳地构建轮廓轮廓替换得分却很少达成共识,并且已经提出了大量这种评分系统。在这里,我们评估了各种这样的替换得分。对于这种评估,给定一组多个标准对齐,我们计算了当与相关列而不是不相关列对齐时,轮廓列产生更高替换得分的概率。我们还将此度量推广到两个或三个相邻列的集合。这种简单方法的优点是,它主要不依赖于具有最薄弱经验支持的金标准对齐列,并且不需要为每个研究的替换得分拟合间隙和偏移成本。

结果

通常,简单的均值轮廓序列得分的对称化表现最好。紧随其后的是使用各种合理方法构建的几种特定评分系统。

联系信息

altschul@ncbi.nlm.nih.gov

补充信息

补充数据可在Bioinformatics 在线获得。

相似文献

1
An assessment of substitution scores for protein profile-profile comparison.蛋白质图谱-图谱比较替代评分评估。
Bioinformatics. 2011 Dec 15;27(24):3356-63. doi: 10.1093/bioinformatics/btr565. Epub 2011 Oct 13.
4
Log-odds sequence logos.对数几率序列图谱。
Bioinformatics. 2015 Feb 1;31(3):324-31. doi: 10.1093/bioinformatics/btu634. Epub 2014 Oct 6.
5
A comparison of scoring functions for protein sequence profile alignment.蛋白质序列谱比对评分函数的比较
Bioinformatics. 2004 May 22;20(8):1301-8. doi: 10.1093/bioinformatics/bth090. Epub 2004 Feb 12.
10
On the significance of sequence alignments when using multiple scoring matrices.关于使用多个评分矩阵时序列比对的重要性。
Bioinformatics. 2004 Apr 12;20(6):881-7. doi: 10.1093/bioinformatics/btg498. Epub 2004 Jan 29.

本文引用的文献

4
Quality measures for protein alignment benchmarks.蛋白质比对基准的质量度量。
Nucleic Acids Res. 2010 Apr;38(7):2145-53. doi: 10.1093/nar/gkp1196. Epub 2010 Jan 4.
5
PSI-BLAST pseudocounts and the minimum description length principle.PSI-BLAST伪计数与最小描述长度原则。
Nucleic Acids Res. 2009 Feb;37(3):815-24. doi: 10.1093/nar/gkn981. Epub 2008 Dec 16.
6
Automated protein subfamily identification and classification.蛋白质亚家族的自动识别与分类
PLoS Comput Biol. 2007 Aug;3(8):e160. doi: 10.1371/journal.pcbi.0030160.
7
8
Protein homology detection by HMM-HMM comparison.通过隐马尔可夫模型(HMM)比较进行蛋白质同源性检测。
Bioinformatics. 2005 Apr 1;21(7):951-60. doi: 10.1093/bioinformatics/bti125. Epub 2004 Nov 5.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验