CALIPHO Group , SIB Swiss Institute of Bioinformatics , Geneva , Switzerland.
Department of Microbiology and Molecular Medicine, Faculty of Medicine , University of Geneva , Geneva , Switzerland.
J Proteome Res. 2019 Dec 6;18(12):4154-4166. doi: 10.1021/acs.jproteome.9b00537. Epub 2019 Oct 18.
In 2018, we reported a hybrid pipeline that predicts protein structures with I-TASSER and function with COFACTOR. I-TASSER/COFACTOR achieved Gene Ontology (GO) high prediction accuracies of Fmax = 0.69 and 0.57 for molecular function (MF) and biological process (BP), respectively, on 100 comprehensively annotated proteins. Now we report blinded analyses of newly annotated proteins in the critical assessment of function annotation (CAFA) three function prediction challenge and in neXtProt. For CAFA3 results released in May 2019, our predictions on 267 and 912 human proteins with newly annotated MF and BP terms achieved Fmax = 0.50 and 0.42, respectively, on "No Knowledge" proteins, and 0.51 and 0.74, respectively, on "Limited Knowledge" proteins. While COFACTOR consistently outperforms simple homology-based analysis, its accuracy still depends on template availability. Meanwhile, in neXtProt 2019-01, 25 proteins acquired new function annotation through literature curation at UniProt/Swiss-Prot. Before the release of these curated results, we submitted to neXtProt blinded predictions of free-text function annotation based on predicted GO terms. For 10 of the 25, a good match of free-text or GO term annotation was obtained. These blind tests represent rigorous assessments of I-TASSER/COFACTOR. neXtProt now provides links to precomputed I-TASSER/COFACTOR predictions for proteins without function annotation to facilitate experimental planning on "dark proteins".
2018 年,我们报告了一种混合管道,该管道使用 I-TASSER 预测蛋白质结构,使用 COFACTOR 预测功能。I-TASSER/COFACTOR 在 100 个全面注释的蛋白质上分别实现了基因本体论(GO)分子功能(MF)和生物过程(BP)的高预测准确率,Fmax = 0.69 和 0.57。现在,我们报告了在功能注释关键评估(CAFA)三项功能预测挑战和 neXtProt 中对新注释蛋白质的盲分析。对于 2019 年 5 月发布的 CAFA3 结果,我们对具有新注释 MF 和 BP 术语的 267 个人类蛋白质和 912 个人类蛋白质的预测,在“无知识”蛋白质上的 Fmax 分别为 0.50 和 0.42,在“有限知识”蛋白质上的 Fmax 分别为 0.51 和 0.74。虽然 COFACTOR 始终优于简单的基于同源性的分析,但它的准确性仍然取决于模板的可用性。同时,在 neXtProt 2019-01 中,通过 UniProt/Swiss-Prot 的文献整理,25 种蛋白质获得了新的功能注释。在这些经过整理的结果发布之前,我们根据预测的 GO 术语向 neXtProt 提交了对自由文本功能注释的盲预测。在这 25 种蛋白质中,有 10 种获得了自由文本或 GO 术语注释的良好匹配。这些盲测代表了对 I-TASSER/COFACTOR 的严格评估。neXtProt 现在为没有功能注释的蛋白质提供了预计算的 I-TASSER/COFACTOR 预测链接,以方便“暗蛋白质”的实验规划。