Suppr超能文献

通过结合轮廓-轮廓比对和支持向量机进行折叠识别。

Fold recognition by combining profile-profile alignment and support vector machine.

作者信息

Han Sangjo, Lee Byung-Chul, Yu Seung Taek, Jeong Chan-Seok, Lee Soyoung, Kim Dongsup

机构信息

Department of Biosystems, Korea Advanced Institute of Science and Technology, Daejeon, 305-701, Korea.

出版信息

Bioinformatics. 2005 Jun 1;21(11):2667-73. doi: 10.1093/bioinformatics/bti384. Epub 2005 Mar 15.

Abstract

MOTIVATION

Currently, the most accurate fold-recognition method is to perform profile-profile alignments and estimate the statistical significances of those alignments by calculating Z-score or E-value. Although this scheme is reliable in recognizing relatively close homologs related at the family level, it has difficulty in finding the remote homologs that are related at the superfamily or fold level.

RESULTS

In this paper, we present an alternative method to estimate the significance of the alignments. The alignment between a query protein and a template of length n in the fold library is transformed into a feature vector of length n + 1, which is then evaluated by support vector machine (SVM). The output from SVM is converted to a posterior probability that a query sequence is related to a template, given SVM output. Results show that a new method shows significantly better performance than PSI-BLAST and profile-profile alignment with Z-score scheme. While PSI-BLAST and Z-score scheme detect 16 and 20% of superfamily-related proteins, respectively, at 90% specificity, a new method detects 46% of these proteins, resulting in more than 2-fold increase in sensitivity. More significantly, at the fold level, a new method can detect 14% of remotely related proteins at 90% specificity, a remarkable result considering the fact that the other methods can detect almost none at the same level of specificity.

摘要

动机

目前,最准确的折叠识别方法是进行轮廓-轮廓比对,并通过计算Z分数或E值来估计这些比对的统计显著性。尽管该方案在识别家族水平上相关的相对紧密的同源物方面是可靠的,但在寻找超家族或折叠水平上相关的远源同源物时却存在困难。

结果

在本文中,我们提出了一种估计比对显著性的替代方法。查询蛋白与折叠库中长度为n的模板之间的比对被转换为长度为n + 1的特征向量,然后由支持向量机(SVM)进行评估。给定SVM输出,SVM的输出被转换为查询序列与模板相关的后验概率。结果表明,新方法的性能明显优于PSI-BLAST和采用Z分数方案的轮廓-轮廓比对。在90%的特异性下,PSI-BLAST和Z分数方案分别检测到16%和20%的超家族相关蛋白,而新方法检测到46%的这些蛋白,灵敏度提高了两倍多。更显著的是,在折叠水平上,新方法在90%的特异性下可以检测到14%的远源相关蛋白,考虑到其他方法在相同特异性水平下几乎检测不到任何蛋白,这是一个显著的结果。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验