Department of Computer Science, School of Information and Computer Sciences, University of California, Irvine, CA 92697 USA.
Institute for Genomics and Bioinformatics, University of California, Irvine, CA 92697, USA.
Bioinformatics. 2022 Mar 28;38(7):2064-2065. doi: 10.1093/bioinformatics/btac019.
Accurately predicting protein secondary structure and relative solvent accessibility is important for the study of protein evolution, structure and an early-stage component of typical protein 3D structure prediction pipelines.
We present a new improved version of the SSpro/ACCpro suite of predictors for the prediction of protein secondary structure (in three and eight classes) and relative solvent accessibility. The changes include improved, TensorFlow-trained, deep learning predictors, a richer set of profile features (232 features per residue position) and sequence-only features (71 features per position), a more recent Protein Data Bank (PDB) snapshot for training, better hyperparameter tuning and improvements made to the HOMOLpro module, which leverages structural information from protein segment homologs in the PDB. The new SSpro 6 outperforms the previous version (SSpro 5) by 3-4% in Q3 accuracy and, when used with HOMOLPRO, reaches accuracy in the 95-100% range.
The predictors' software, data and web servers are available through the SCRATCH suite of protein structure predictors at http://scratch.proteomics.ics.uci.edu. To maximize comptatibility and ease of use, the deep learning predictors are re-implemented as pure Python/numpy code without TensorFlow dependency.
Supplementary data are available at Bioinformatics online.
准确预测蛋白质二级结构和相对溶剂可及性对于蛋白质进化、结构的研究以及典型蛋白质 3D 结构预测管道的早期组成部分都很重要。
我们提出了 SSpro/ACCpro 预测器套件的一个新的改进版本,用于预测蛋白质二级结构(分为三类和八类)和相对溶剂可及性。这些变化包括改进的、基于 TensorFlow 的深度学习预测器、更丰富的轮廓特征(每个残基位置 232 个特征)和序列特征(每个位置 71 个特征)、用于训练的更新的蛋白质数据库 (PDB) 快照、更好的超参数调整以及对 HOMOLpro 模块的改进,该模块利用了 PDB 中蛋白质片段同源物的结构信息。新版本的 SSpro 6 在 Q3 准确率方面比上一个版本(SSpro 5)高出 3-4%,并且与 HOMOLPRO 一起使用时,准确率达到 95-100%的范围。
预测器的软件、数据和网络服务器可通过 SCRATCH 蛋白质结构预测器套件获得,网址为 http://scratch.proteomics.ics.uci.edu。为了最大限度地提高兼容性和易用性,深度学习预测器被重新实现为纯 Python/numpy 代码,没有 TensorFlow 依赖关系。
补充数据可在 Bioinformatics 在线获得。