Fornes Oriol, Aragues Ramon, Espadaler Jordi, Marti-Renom Marc A, Sali Andrej, Oliva Baldo
Structural Bioinformatics Lab (GRIB-IMIM), Universitat Pompeu Fabra, Parc de Recerca Biomèdica de Barcelona (PRBB), Barcelona, Catalonia, Spain.
Bioinformatics. 2009 Jun 15;25(12):1506-12. doi: 10.1093/bioinformatics/btp238. Epub 2009 Apr 8.
Several strategies have been developed to predict the fold of a target protein sequence, most of which are based on aligning the target sequence to other sequences of known structure. Previously, we demonstrated that the consideration of protein-protein interactions significantly increases the accuracy of fold assignment compared with PSI-BLAST sequence comparisons. A drawback of our method was the low number of proteins to which a fold could be assigned. Here, we present an improved version of the method that addresses this limitation. We also compare our method to other state-of-the-art fold assignment methodologies.
Our approach (ModLink+) has been tested on 3716 proteins with domain folds classified in the Structural Classification Of Proteins (SCOP) as well as known interacting partners in the Database of Interacting Proteins (DIP). For this test set, the ratio of success [positive predictive value (PPV)] on fold assignment increases from 75% for PSI-BLAST, 83% for HHSearch and 81% for PRC to >90% for ModLink+at the e-value cutoff of 10(-3). Under this e-value, ModLink+can assign a fold to 30-45% of the proteins in the test set, while our previous method could cover <25%. When applied to 6384 proteins with unknown fold in the yeast proteome, ModLink+combined with PSI-BLAST assigns a fold for domains in 3738 proteins, while PSI-BLAST alone covers only 2122 proteins, HHSearch 2969 and PRC 2826 proteins, using a threshold e-value that would represent a PPV >82% for each method in the test set.
The ModLink+server is freely accessible in the World Wide Web at http://sbi.imim.es/modlink/.
Supplementary data are available at Bioinformatics online.
已经开发了多种策略来预测目标蛋白质序列的折叠结构,其中大多数基于将目标序列与其他已知结构的序列进行比对。此前,我们证明,与PSI-BLAST序列比对相比,考虑蛋白质-蛋白质相互作用可显著提高折叠结构分配的准确性。我们方法的一个缺点是能够分配折叠结构的蛋白质数量较少。在此,我们提出了该方法的一个改进版本,以解决这一局限性。我们还将我们的方法与其他最新的折叠结构分配方法进行了比较。
我们的方法(ModLink+)已在3716个蛋白质上进行了测试,这些蛋白质的结构域折叠在蛋白质结构分类数据库(SCOP)中进行了分类,并且在相互作用蛋白质数据库(DIP)中有已知的相互作用伙伴。对于这个测试集,在e值截止为10^(-3)时,折叠结构分配的成功率[阳性预测值(PPV)]从PSI-BLAST的75%、HHSearch的83%和PRC的81%提高到ModLink+的>90%。在此e值下,ModLink+可以为测试集中30%-45%的蛋白质分配折叠结构,而我们之前的方法只能覆盖<25%。当应用于酵母蛋白质组中6384个折叠结构未知的蛋白质时,ModLink+与PSI-BLAST相结合为3738个蛋白质中的结构域分配了折叠结构,而仅PSI-BLAST只能覆盖2122个蛋白质,HHSearch为2969个,PRC为2826个蛋白质,使用的阈值e值代表测试集中每种方法的PPV>82%。
ModLink+服务器可通过万维网免费访问,网址为http://sbi.imim.es/modlink/。
补充数据可在《生物信息学》在线获取。