蛋白质图谱-图谱比较替代评分评估。

An assessment of substitution scores for protein profile-profile comparison.

机构信息

National Center for Biotechnology Information, National Library of Medicine, National Institutes of Health, Bethesda, MD 20894, USA.

出版信息

Bioinformatics. 2011 Dec 15;27(24):3356-63. doi: 10.1093/bioinformatics/btr565. Epub 2011 Oct 13.

DOI:10.1093/bioinformatics/btr565

PMID:21998158

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3232366/

Abstract

MOTIVATION

Pairwise protein sequence alignments are generally evaluated using scores defined as the sum of substitution scores for aligning amino acids to one another, and gap scores for aligning runs of amino acids in one sequence to null characters inserted into the other. Protein profiles may be abstracted from multiple alignments of protein sequences, and substitution and gap scores have been generalized to the alignment of such profiles either to single sequences or to other profiles. Although there is widespread agreement on the general form substitution scores should take for profile-sequence alignment, little consensus has been reached on how best to construct profile-profile substitution scores, and a large number of these scoring systems have been proposed. Here, we assess a variety of such substitution scores. For this evaluation, given a gold standard set of multiple alignments, we calculate the probability that a profile column yields a higher substitution score when aligned to a related than to an unrelated column. We also generalize this measure to sets of two or three adjacent columns. This simple approach has the advantages that it does not depend primarily upon the gold-standard alignment columns with the weakest empirical support, and that it does not need to fit gap and offset costs for use with each substitution score studied.

RESULTS

A simple symmetrization of mean profile-sequence scores usually performed the best. These were followed closely by several specific scoring systems constructed using a variety of rationales.

CONTACT

altschul@ncbi.nlm.nih.gov

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

通常使用定义为将氨基酸相互对齐的替换得分之和以及将一个序列中的氨基酸连续序列对齐到插入另一个序列的空字符的对齐得分来评估成对蛋白质序列比对。蛋白质图谱可以从蛋白质序列的多重比对中抽象出来，并且替换和间隙得分已经被推广到这种图谱与单个序列或其他图谱的对齐。尽管对于序列比对的轮廓替换得分应采取的一般形式存在广泛的共识，但对于如何最佳地构建轮廓轮廓替换得分却很少达成共识，并且已经提出了大量这种评分系统。在这里，我们评估了各种这样的替换得分。对于这种评估，给定一组多个标准对齐，我们计算了当与相关列而不是不相关列对齐时，轮廓列产生更高替换得分的概率。我们还将此度量推广到两个或三个相邻列的集合。这种简单方法的优点是，它主要不依赖于具有最薄弱经验支持的金标准对齐列，并且不需要为每个研究的替换得分拟合间隙和偏移成本。

结果

通常，简单的均值轮廓序列得分的对称化表现最好。紧随其后的是使用各种合理方法构建的几种特定评分系统。

联系信息

altschul@ncbi.nlm.nih.gov

补充信息

补充数据可在Bioinformatics 在线获得。

相似文献

An assessment of substitution scores for protein profile-profile comparison.蛋白质图谱-图谱比较替代评分评估。

Bioinformatics. 2011 Dec 15;27(24):3356-63. doi: 10.1093/bioinformatics/btr565. Epub 2011 Oct 13.

The construction and use of log-odds substitution scores for multiple sequence alignment.多序列比对中对对数几率替换评分的构建和使用。

PLoS Comput Biol. 2010 Jul 15;6(7):e1000852. doi: 10.1371/journal.pcbi.1000852.

Scoring profile-to-profile sequence alignments.对图谱与图谱之间的序列进行比对评分。

Protein Sci. 2004 Jun;13(6):1612-26. doi: 10.1110/ps.03601504.

Log-odds sequence logos.对数几率序列图谱。

Bioinformatics. 2015 Feb 1;31(3):324-31. doi: 10.1093/bioinformatics/btu634. Epub 2014 Oct 6.

A comparison of scoring functions for protein sequence profile alignment.蛋白质序列谱比对评分函数的比较

Bioinformatics. 2004 May 22;20(8):1301-8. doi: 10.1093/bioinformatics/bth090. Epub 2004 Feb 12.

Robust sequence alignment using evolutionary rates coupled with an amino acid substitution matrix.使用进化速率结合氨基酸替换矩阵进行稳健的序列比对。

BMC Bioinformatics. 2015 Aug 14;16:255. doi: 10.1186/s12859-015-0688-8.

A low-complexity add-on score for protein remote homology search with COMER.COMER 辅助的蛋白质远程同源搜索的低复杂度附加评分。

Bioinformatics. 2018 Jun 15;34(12):2037-2045. doi: 10.1093/bioinformatics/bty048.

Rapid detection, classification and accurate alignment of up to a million or more related protein sequences.快速检测、分类和精确比对多达上百万条甚至更多的相关蛋白质序列。

Bioinformatics. 2009 Aug 1;25(15):1869-75. doi: 10.1093/bioinformatics/btp342. Epub 2009 Jun 8.

Bioinformatics. 2015 Mar 1;31(5):674-81. doi: 10.1093/bioinformatics/btu697. Epub 2014 Oct 22.

On the significance of sequence alignments when using multiple scoring matrices.关于使用多个评分矩阵时序列比对的重要性。

Bioinformatics. 2004 Apr 12;20(6):881-7. doi: 10.1093/bioinformatics/btg498. Epub 2004 Jan 29.

引用本文的文献

BioSeq-BLM: a platform for analyzing DNA, RNA and protein sequences based on biological language models.BioSeq-BLM：一个基于生物语言模型分析 DNA、RNA 和蛋白质序列的平台。

Nucleic Acids Res. 2021 Dec 16;49(22):e129. doi: 10.1093/nar/gkab829.

ReformAlign: improved multiple sequence alignments using a profile-based meta-alignment approach.ReformAlign：基于轮廓的元对齐方法改进的多重序列比对。

BMC Bioinformatics. 2014 Aug 7;15(1):265. doi: 10.1186/1471-2105-15-265.

Dirichlet mixtures, the Dirichlet process, and the structure of protein space.狄利克雷混合模型、狄利克雷过程与蛋白质空间结构

J Comput Biol. 2013 Jan;20(1):1-18. doi: 10.1089/cmb.2012.0244.

本文引用的文献

A novel immunity system for bacterial nucleic acid degrading toxins and its recruitment in various eukaryotic and DNA viral systems.一种用于细菌核酸降解毒素的新型免疫防御系统及其在各种真核生物和 DNA 病毒系统中的招募。

Nucleic Acids Res. 2011 Jun;39(11):4532-52. doi: 10.1093/nar/gkr036. Epub 2011 Feb 8.

Identification of novel families and classification of the C2 domain superfamily elucidate the origin and evolution of membrane targeting activities in eukaryotes.鉴定新型家族和 C2 结构域超家族的分类，阐明了真核生物中膜靶向活性的起源和进化。

Gene. 2010 Dec 1;469(1-2):18-30. doi: 10.1016/j.gene.2010.08.006. Epub 2010 Aug 14.

The construction and use of log-odds substitution scores for multiple sequence alignment.多序列比对中对对数几率替换评分的构建和使用。

PLoS Comput Biol. 2010 Jul 15;6(7):e1000852. doi: 10.1371/journal.pcbi.1000852.

Quality measures for protein alignment benchmarks.蛋白质比对基准的质量度量。

Nucleic Acids Res. 2010 Apr;38(7):2145-53. doi: 10.1093/nar/gkp1196. Epub 2010 Jan 4.

PSI-BLAST pseudocounts and the minimum description length principle.PSI-BLAST伪计数与最小描述长度原则。

Nucleic Acids Res. 2009 Feb;37(3):815-24. doi: 10.1093/nar/gkn981. Epub 2008 Dec 16.

Automated protein subfamily identification and classification.蛋白质亚家族的自动识别与分类

PLoS Comput Biol. 2007 Aug;3(8):e160. doi: 10.1371/journal.pcbi.0030160.

COBALT: constraint-based alignment tool for multiple protein sequences.COBALT：用于多条蛋白质序列的基于约束的比对工具。

Bioinformatics. 2007 May 1;23(9):1073-9. doi: 10.1093/bioinformatics/btm076. Epub 2007 Mar 1.

Protein homology detection by HMM-HMM comparison.通过隐马尔可夫模型（HMM）比较进行蛋白质同源性检测。

Bioinformatics. 2005 Apr 1;21(7):951-60. doi: 10.1093/bioinformatics/bti125. Epub 2004 Nov 5.

SABmark--a benchmark for sequence alignment that covers the entire known fold space.SABmark——一种涵盖整个已知折叠空间的序列比对基准。

Bioinformatics. 2005 Apr 1;21(7):1267-8. doi: 10.1093/bioinformatics/bth493. Epub 2004 Aug 27.

Profile-profile methods provide improved fold-recognition: a study of different profile-profile alignment methods.轮廓-轮廓方法可提供改进的折叠识别：不同轮廓-轮廓比对方法的研究

Proteins. 2004 Oct 1;57(1):188-97. doi: 10.1002/prot.20184.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验