Suppr超能文献

定义功能性蛋白质序列模式的相似性阈值:信号肽切割位点。

Defining a similarity threshold for a functional protein sequence pattern: the signal peptide cleavage site.

作者信息

Nielsen H, Engelbrecht J, von Heijne G, Brunak S

机构信息

Center for Biological Sequence Analysis, Department of Physical Chemistry, The Technical University of Denmark, Lyngby.

出版信息

Proteins. 1996 Feb;24(2):165-77. doi: 10.1002/(SICI)1097-0134(199602)24:2<165::AID-PROT4>3.0.CO;2-I.

Abstract

When preparing data sets of amino acid or nucleotide sequences it is necessary to exclude redundant or homologous sequences in order to avoid overestimating the predictive performance of an algorithm. For some time methods for doing this have been available in the area of protein structure prediction. We have developed a similar procedure based on pair-wise alignments for sequences with functional sites. We show how a correlation coefficient between sequence similarity and functional homology can be used to compare the efficiency of different similarity measures and choose a nonarbitrary threshold value for excluding redundant sequences. The impact of the choice of scoring matrix used in the alignments is examined. We demonstrate that the parameter determining the quality of the correlation is the relative entropy of the matrix, rather than the assumed (PAM or identity) substitution mode. Results are presented for the case of prediction of cleavage sites in signal peptides. By inspection of the false positives, several errors in the database were found. The procedure presented may be used as a general outline for finding a problem-specific similarity measure and threshold value for analysis of other functional amino acid or nucleotide sequence patterns.

摘要

在准备氨基酸或核苷酸序列数据集时,有必要排除冗余或同源序列,以避免高估算法的预测性能。一段时间以来,蛋白质结构预测领域已有进行此操作的方法。我们基于功能位点序列的两两比对开发了类似的程序。我们展示了如何使用序列相似性与功能同源性之间的相关系数来比较不同相似性度量的效率,并为排除冗余序列选择一个非任意的阈值。研究了比对中使用的评分矩阵选择的影响。我们证明,决定相关性质量的参数是矩阵的相对熵,而非假定的(PAM或同一性)替换模式。给出了信号肽切割位点预测情况的结果。通过检查假阳性,发现了数据库中的一些错误。所提出的程序可作为一个通用框架,用于找到针对特定问题的相似性度量和阈值,以分析其他功能性氨基酸或核苷酸序列模式。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验