SPOT-1D-单序列：利用大型训练集和集成深度学习改进基于单序列的蛋白质二级结构、主链角度、溶剂可及性和半球暴露预测。

SPOT-1D-Single: improving the single-sequence-based prediction of protein secondary structure, backbone angles, solvent accessibility and half-sphere exposures using a large training set and ensembled deep learning.

作者信息

Singh Jaspreet, Litfin Thomas, Paliwal Kuldip, Singh Jaswinder, Hanumanthappa Anil Kumar, Zhou Yaoqi

机构信息

Signal Processing Laboratory, School of Engineering and Built Environment, Griffith University, Brisbane, QLD 4111, Australia.

School of Information and Communication Technology, Griffith University, Southport, QLD 4222, Australia.

出版信息

Bioinformatics. 2021 Oct 25;37(20):3464-3472. doi: 10.1093/bioinformatics/btab316.

DOI:10.1093/bioinformatics/btab316

PMID:33983382

Abstract

MOTIVATION

Knowing protein secondary and other one-dimensional structural properties are essential for accurate protein structure and function prediction. As a result, many methods have been developed for predicting these one-dimensional structural properties. However, most methods relied on evolutionary information that may not exist for many proteins due to a lack of sequence homologs. Moreover, it is computationally intensive for obtaining evolutionary information as the library of protein sequences continues to expand exponentially. Here, we developed a new single-sequence method called SPOT-1D-Single based on a large training dataset of 39 120 proteins deposited prior to 2016 and an ensemble of hybrid long-short-term-memory bidirectional neural network and convolutional neural network.

RESULTS

We showed that SPOT-1D-Single consistently improves over SPIDER3-Single and ProteinUnet for secondary structure, solvent accessibility, contact number and backbone angles prediction for all seven independent test sets (TEST2018, SPOT-2016, SPOT-2016-HQ, SPOT-2018, SPOT-2018-HQ, CASP12 and CASP13 free-modeling targets). For example, the predicted three-state secondary structure's accuracy ranges from 72.12% to 74.28% by SPOT-1D-Single, compared to 69.1-72.6% by SPIDER3-Single and 70.6-73% by ProteinUnet. SPOT-1D-Single also predicts SS3 and SS8 with 6.24% and 6.98% better accuracy than SPOT-1D on SPOT-2018 proteins with no homologs (Neff = 1), respectively. The new method's improvement over existing techniques is due to a larger training set combined with ensembled learning.

AVAILABILITY AND IMPLEMENTATION

Standalone-version of SPOT-1D-Single is available at https://github.com/jas-preet/SPOT-1D-Single. Direct prediction can also be made at https://sparks-lab.org/server/spot-1d-single. The datasets used in this research can also be downloaded from GitHub.

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

了解蛋白质二级结构和其他一维结构特性对于准确预测蛋白质结构和功能至关重要。因此，已经开发了许多方法来预测这些一维结构特性。然而，大多数方法依赖于进化信息，由于缺乏序列同源物，许多蛋白质可能不存在这种信息。此外，随着蛋白质序列库呈指数级持续扩展，获取进化信息的计算量很大。在此，我们基于2016年之前存入的39120个蛋白质的大型训练数据集以及混合长短时记忆双向神经网络和卷积神经网络的集成，开发了一种名为SPOT-1D-Single的新单序列方法。

结果

我们表明，对于所有七个独立测试集（TEST2018、SPOT-2016、SPOT-2016-HQ、SPOT-2018、SPOT-2018-HQ、CASP12和CASP13自由建模目标），SPOT-1D-Single在二级结构、溶剂可及性、接触数和主链角度预测方面始终优于SPIDER3-Single和ProteinUnet。例如，SPOT-1D-Single预测的三态二级结构准确率在72.12%至74.28%之间，而SPIDER3-Single为69.1 - 72.6%，ProteinUnet为70.6 - 73%。在没有同源物（有效数量=1）的SPOT-2018蛋白质上，SPOT-1D-Single预测SS3和SS8的准确率分别比SPOT-1D高6.24%和6.98%。新方法相对于现有技术的改进归因于更大的训练集与集成学习相结合。

可用性和实现

SPOT-1D-Single的独立版本可在https://github.com/jas-preet/SPOT-1D-Single获取。也可在https://sparks-lab.org/server/spot-1d-single进行直接预测。本研究中使用的数据集也可从GitHub下载。

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

SPOT-1D-Single: improving the single-sequence-based prediction of protein secondary structure, backbone angles, solvent accessibility and half-sphere exposures using a large training set and ensembled deep learning.SPOT-1D-单序列：利用大型训练集和集成深度学习改进基于单序列的蛋白质二级结构、主链角度、溶剂可及性和半球暴露预测。

Bioinformatics. 2021 Oct 25;37(20):3464-3472. doi: 10.1093/bioinformatics/btab316.

SPOT-Contact-LM: improving single-sequence-based prediction of protein contact map using a transformer language model.SPOT-Contact-LM：使用 Transformer 语言模型改进基于单序列的蛋白质接触图预测。

Bioinformatics. 2022 Mar 28;38(7):1888-1894. doi: 10.1093/bioinformatics/btac053.

Improving prediction of protein secondary structure, backbone angles, solvent accessibility and contact numbers by using predicted contact maps and an ensemble of recurrent and residual convolutional neural networks.利用预测的接触图和递归与残差卷积神经网络的集合来改进蛋白质二级结构、主链角度、溶剂可及性和接触数的预测。

Bioinformatics. 2019 Jul 15;35(14):2403-2410. doi: 10.1093/bioinformatics/bty1006.

Reaching alignment-profile-based accuracy in predicting protein secondary and tertiary structural properties without alignment.无需对齐即可达到基于对齐轮廓的预测蛋白质二级和三级结构性质的准确性。

Sci Rep. 2022 May 9;12(1):7607. doi: 10.1038/s41598-022-11684-w.

OPUS-TASS: a protein backbone torsion angles and secondary structure predictor based on ensemble neural networks.OPUS-TASS：一种基于集成神经网络的蛋白质骨架扭转角和二级结构预测器。

Bioinformatics. 2020 Dec 22;36(20):5021-5026. doi: 10.1093/bioinformatics/btaa629.

Single-sequence-based prediction of protein secondary structures and solvent accessibility by deep whole-sequence learning.基于单序列的深度学习全序列预测蛋白质二级结构和溶剂可及性。

J Comput Chem. 2018 Oct 5;39(26):2210-2216. doi: 10.1002/jcc.25534. Epub 2018 Oct 14.

ProteinUnet-An efficient alternative to SPIDER3-single for sequence-based prediction of protein secondary structures.ProteinUnet—一种比 SPIDER3-single 更高效的基于序列的蛋白质二级结构预测方法。

J Comput Chem. 2021 Jan 5;42(1):50-59. doi: 10.1002/jcc.26432. Epub 2020 Oct 15.

Capturing non-local interactions by long short-term memory bidirectional recurrent neural networks for improving prediction of protein secondary structure, backbone angles, contact numbers and solvent accessibility.利用长短期记忆双向递归神经网络捕捉非局部相互作用，提高蛋白质二级结构、主链角度、接触数和溶剂可及性的预测能力。

Bioinformatics. 2017 Sep 15;33(18):2842-2849. doi: 10.1093/bioinformatics/btx218.

Single-sequence and profile-based prediction of RNA solvent accessibility using dilated convolutional neural network.基于扩张卷积神经网络的 RNA 溶剂可及性的单序列和轮廓预测。

Bioinformatics. 2021 Jan 29;36(21):5169-5176. doi: 10.1093/bioinformatics/btaa652.

Improved RNA secondary structure and tertiary base-pairing prediction using evolutionary profile, mutational coupling and two-dimensional transfer learning.利用进化谱、突变耦合和二维迁移学习改进RNA二级结构和三级碱基配对预测。

Bioinformatics. 2021 Sep 9;37(17):2589-2600. doi: 10.1093/bioinformatics/btab165.

引用本文的文献

Comprehensive assessment of AlphaFold's predictions of secondary structure and solvent accessibility at the amino acid-level in eukaryotic, bacterial and archaeal proteins.对AlphaFold在真核生物、细菌和古细菌蛋白质氨基酸水平上的二级结构和溶剂可及性预测进行全面评估。

Comput Struct Biotechnol J. 2025 May 29;27:2443-2449. doi: 10.1016/j.csbj.2025.05.047. eCollection 2025.

Advancements in one-dimensional protein structure prediction using machine learning and deep learning.利用机器学习和深度学习进行一维蛋白质结构预测的进展。

Comput Struct Biotechnol J. 2025 Apr 3;27:1416-1430. doi: 10.1016/j.csbj.2025.04.005. eCollection 2025.

GraphPhos: Predict Protein-Phosphorylation Sites Based on Graph Neural Networks.GraphPhos：基于图神经网络预测蛋白质磷酸化位点

Int J Mol Sci. 2025 Jan 23;26(3):941. doi: 10.3390/ijms26030941.

Porter 6: Protein Secondary Structure Prediction by Leveraging Pre-Trained Language Models (PLMs).波特6：利用预训练语言模型（PLMs）进行蛋白质二级结构预测。

Int J Mol Sci. 2024 Dec 27;26(1):130. doi: 10.3390/ijms26010130.

Improving Protein Secondary Structure Prediction by Deep Language Models and Transformer Networks.深度学习语言模型和变换网络在蛋白质二级结构预测中的改进。

Methods Mol Biol. 2025;2867:43-53. doi: 10.1007/978-1-0716-4196-5_3.

Recent Advances in Computational Prediction of Secondary and Supersecondary Structures from Protein Sequences.从蛋白质序列预测二级和超二级结构的计算方法的最新进展

Methods Mol Biol. 2025;2870:1-19. doi: 10.1007/978-1-0716-4213-9_1.

DCMA: faster protein backbone dihedral angle prediction using a dilated convolutional attention-based neural network.DCMA：使用基于扩张卷积注意力的神经网络进行更快的蛋白质主链二面角预测。

Front Bioinform. 2024 Oct 18;4:1477909. doi: 10.3389/fbinf.2024.1477909. eCollection 2024.

ILMCNet: A Deep Neural Network Model That Uses PLM to Process Features and Employs CRF to Predict Protein Secondary Structure.ILMCNet：一种利用 PLM 处理特征并采用 CRF 预测蛋白质二级结构的深度神经网络模型。

Genes (Basel). 2024 Oct 21;15(10):1350. doi: 10.3390/genes15101350.

Nphos: Database and Predictor of Protein N-phosphorylation.Nphos：蛋白质 N-磷酸化数据库和预测器。

Genomics Proteomics Bioinformatics. 2024 Sep 13;22(3). doi: 10.1093/gpbjnl/qzae032.

PCP-GC-LM: single-sequence-based protein contact prediction using dual graph convolutional neural network and convolutional neural network.PCP-GC-LM：基于双图卷积神经网络和卷积神经网络的单序列蛋白质接触预测。

BMC Bioinformatics. 2024 Sep 2;25(1):287. doi: 10.1186/s12859-024-05914-3.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

SPOT-1D-单序列：利用大型训练集和集成深度学习改进基于单序列的蛋白质二级结构、主链角度、溶剂可及性和半球暴露预测。

SPOT-1D-Single: improving the single-sequence-based prediction of protein secondary structure, backbone angles, solvent accessibility and half-sphere exposures using a large training set and ensembled deep learning.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现

补充信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献