Suppr超能文献

蛋白质序列数据库搜索方法的比较。

Comparison of methods for searching protein sequence databases.

作者信息

Pearson W R

机构信息

Department of Biochemistry, University of Virginia, Charlottesville 22908, USA.

出版信息

Protein Sci. 1995 Jun;4(6):1145-60. doi: 10.1002/pro.5560040613.

Abstract

We have compared commonly used sequence comparison algorithms, scoring matrices, and gap penalties using a method that identifies statistically significant differences in performance. Search sensitivity with either the Smith-Waterman algorithm or FASTA is significantly improved by using modern scoring matrices, such as BLOSUM45-55, and optimized gap penalties instead of the conventional PAM250 matrix. More dramatic improvement can be obtained by scaling similarity scores by the logarithm of the length of the library sequence (In()-scaling). With the best modern scoring matrix (BLOSUM55 or JO93) and optimal gap penalties (-12 for the first residue in the gap and -2 for additional residues), Smith-Waterman and FASTA performed significantly better than BLASTP. With In()-scaling and optimal scoring matrices (BLOSUM45 or Gonnet92) and gap penalties (-12, -1), the rigorous Smith-Waterman algorithm performs better than either BLASTP and FASTA, although with the Gonnet92 matrix the difference with FASTA was not significant. Ln()-scaling performed better than normalization based on other simple functions of library sequence length. Ln()-scaling also performed better than scores based on normalized variance, but the differences were not statistically significant for the BLOSUM50 and Gonnet92 matrices. Optimal scoring matrices and gap penalties are reported for Smith-Waterman and FASTA, using conventional or In()-scaled similarity scores. Searches with no penalty for gap extension, or no penalty for gap opening, or an infinite penalty for gaps performed significantly worse than the best methods. Differences in performance between FASTA and Smith-Waterman were not significant when partial query sequences were used. However, the best performance with complete query sequences was obtained with the Smith-Waterman algorithm and In()-scaling.

摘要

我们使用一种能识别性能上具有统计学显著差异的方法,比较了常用的序列比对算法、评分矩阵和空位罚分。使用现代评分矩阵,如BLOSUM45 - 55,并采用优化的空位罚分,而非传统的PAM250矩阵,能显著提高Smith - Waterman算法或FASTA的搜索灵敏度。通过用文库序列长度的对数(In() - 缩放)对相似性得分进行缩放,可获得更显著的改进。使用最佳的现代评分矩阵(BLOSUM55或JO93)和最优空位罚分(空位中第一个残基为 - 12,后续残基为 - 2)时,Smith - Waterman和FASTA的表现明显优于BLASTP。采用In() - 缩放以及最优评分矩阵(BLOSUM45或Gonnet92)和空位罚分( - 12, - 1)时,严格的Smith - Waterman算法比BLASTP和FASTA表现更好,不过使用Gonnet92矩阵时与FASTA的差异并不显著。In() - 缩放比基于文库序列长度的其他简单函数进行归一化的效果更好。In() - 缩放也比基于归一化方差的得分表现更好,但对于BLOSUM50和Gonnet92矩阵,差异无统计学意义。报告了使用传统或In() - 缩放相似性得分时,Smith - Waterman和FASTA的最优评分矩阵和空位罚分。不进行空位延伸罚分、不进行空位开放罚分或对空位采用无穷大罚分的搜索,其表现明显比最佳方法差。使用部分查询序列时,FASTA和Smith - Waterman之间的性能差异不显著。然而,对于完整查询序列,使用Smith - Waterman算法和In() - 缩放可获得最佳性能。

相似文献

5
Effective protein sequence comparison.有效的蛋白质序列比较。
Methods Enzymol. 1996;266:227-58. doi: 10.1016/s0076-6879(96)66017-0.
9
BALSA: Bayesian algorithm for local sequence alignment.BALSA:用于局部序列比对的贝叶斯算法。
Nucleic Acids Res. 2002 Mar 1;30(5):1268-77. doi: 10.1093/nar/30.5.1268.

引用本文的文献

9
A Computational Approach Using Bioinformatics to Screening Drug Targets for Species.一种利用生物信息学筛选物种药物靶点的计算方法。
Evid Based Complement Alternat Med. 2018 Mar 28;2018:6813467. doi: 10.1155/2018/6813467. eCollection 2018.

本文引用的文献

5
Identification of common molecular subsequences.常见分子子序列的鉴定
J Mol Biol. 1981 Mar 25;147(1):195-7. doi: 10.1016/0022-2836(81)90087-5.
6
Phase transitions in sequence matches and nucleic acid structure.序列匹配和核酸结构中的相变
Proc Natl Acad Sci U S A. 1987 Mar;84(5):1239-43. doi: 10.1073/pnas.84.5.1239.
7
The significance of protein sequence similarities.蛋白质序列相似性的意义。
Comput Appl Biosci. 1988 Mar;4(1):67-71. doi: 10.1093/bioinformatics/4.1.67.
8
Improved tools for biological sequence comparison.用于生物序列比较的改进工具。
Proc Natl Acad Sci U S A. 1988 Apr;85(8):2444-8. doi: 10.1073/pnas.85.8.2444.
10
Basic local alignment search tool.基本局部比对搜索工具
J Mol Biol. 1990 Oct 5;215(3):403-10. doi: 10.1016/S0022-2836(05)80360-2.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验