• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

后处理通过二阶深度学习和嵌入增强蛋白质二级结构预测。

Post-processing enhances protein secondary structure prediction with second order deep learning and embeddings.

作者信息

Chatzimiltis Sotiris, Agathocleous Michalis, Promponas Vasilis J, Christodoulou Chris

机构信息

University of Cyprus, Department of Computer Science, Nicosia, Cyprus.

5G/6GIC, Institute for Communication Systems (ICS), University of Surrey, Guildford, United Kingdom.

出版信息

Comput Struct Biotechnol J. 2025 Jan 2;27:243-251. doi: 10.1016/j.csbj.2024.12.022. eCollection 2025.

DOI:10.1016/j.csbj.2024.12.022
PMID:39866664
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11764030/
Abstract

Protein Secondary Structure Prediction (PSSP) is regarded as a challenging task in bioinformatics, and numerous approaches to achieve a more accurate prediction have been proposed. Accurate PSSP can be instrumental in inferring protein tertiary structure and their functions. Machine Learning and in particular Deep Learning approaches show promising results for the PSSP problem. In this paper, we deploy a Convolutional Neural Network (CNN) trained with the Subsampled Hessian Newton (SHN) method (a Hessian Free Optimisation variant), with a two- dimensional input representation of embeddings extracted from a language model pretrained with protein sequences. Utilising a CNN trained with the SHN method and the input embeddings, we achieved on average a 79.96% per residue (Q3) accuracy on the CB513 dataset and 81.45% Q3 accuracy on the PISCES dataset (without any post-processing techniques applied). The application of ensembles and filtering techniques to the results of the CNN improved the overall prediction performance. The Q3 accuracy on the CB513 increased to 93.65% and for the PISCES dataset to 87.13%. Moreover, our method was evaluated using the CASP13 dataset where we showed that as the post-processing window size increased, the prediction performance increased as well. In fact, with the biggest post-processing window size (limited by the smallest CASP13 protein), we achieved a Q3 accuracy of 98.12% and a Segment Overlap (SOV) score of 96.98 on the CASP13 dataset when the CNNs were trained with the PISCES dataset. Finally, we showed that input representations from embeddings can perform equally well as representations extracted from multiple sequence alignments.

摘要

蛋白质二级结构预测(PSSP)在生物信息学中被视为一项具有挑战性的任务,并且已经提出了许多方法来实现更准确的预测。准确的PSSP有助于推断蛋白质的三级结构及其功能。机器学习,尤其是深度学习方法,在PSSP问题上显示出了有前景的结果。在本文中,我们部署了一个使用子采样海森牛顿(SHN)方法(海森自由优化变体)训练的卷积神经网络(CNN),其输入为从用蛋白质序列预训练的语言模型中提取的嵌入的二维表示。利用用SHN方法训练的CNN和输入嵌入,我们在CB513数据集上平均每个残基的准确率(Q3)达到了79.96%,在双鱼座数据集上Q3准确率达到了81.45%(未应用任何后处理技术)。将集成和过滤技术应用于CNN的结果提高了整体预测性能。CB513数据集上的Q3准确率提高到了93.65%,双鱼座数据集上提高到了87.13%。此外,我们的方法使用CASP13数据集进行了评估,结果表明随着后处理窗口大小的增加,预测性能也随之提高。事实上,在使用双鱼座数据集训练CNN时,对于最大的后处理窗口大小(受最小的CASP13蛋白质限制),我们在CASP13数据集上实现了98.12%的Q3准确率和96.98%的片段重叠(SOV)分数。最后,我们表明来自嵌入的输入表示与从多序列比对中提取的表示表现相当。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae7e/11764030/d88223de174c/gr007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae7e/11764030/1f2fa13297f4/gr001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae7e/11764030/e51bf0917642/gr002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae7e/11764030/9cc4a6d1623e/gr003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae7e/11764030/0a4143093d76/gr004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae7e/11764030/bcd3d54cdc62/gr005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae7e/11764030/dd61d1b09c8c/gr006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae7e/11764030/d88223de174c/gr007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae7e/11764030/1f2fa13297f4/gr001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae7e/11764030/e51bf0917642/gr002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae7e/11764030/9cc4a6d1623e/gr003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae7e/11764030/0a4143093d76/gr004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae7e/11764030/bcd3d54cdc62/gr005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae7e/11764030/dd61d1b09c8c/gr006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/ae7e/11764030/d88223de174c/gr007.jpg

相似文献

1
Post-processing enhances protein secondary structure prediction with second order deep learning and embeddings.后处理通过二阶深度学习和嵌入增强蛋白质二级结构预测。
Comput Struct Biotechnol J. 2025 Jan 2;27:243-251. doi: 10.1016/j.csbj.2024.12.022. eCollection 2025.
2
PSSP-MFFNet: A Multifeature Fusion Network for Protein Secondary Structure Prediction.PSSP-MFFNet:一种用于蛋白质二级结构预测的多特征融合网络。
ACS Omega. 2024 Jan 25;9(5):5985-5994. doi: 10.1021/acsomega.3c10230. eCollection 2024 Feb 6.
3
Deep learning for protein secondary structure prediction: Pre and post-AlphaFold.用于蛋白质二级结构预测的深度学习:AlphaFold之前与之后。
Comput Struct Biotechnol J. 2022 Nov 11;20:6271-6286. doi: 10.1016/j.csbj.2022.11.012. eCollection 2022.
4
Deep Ensemble Learning with Atrous Spatial Pyramid Networks for Protein Secondary Structure Prediction.基于空洞空间金字塔网络的深度集成学习用于蛋白质二级结构预测
Biomolecules. 2022 Jun 2;12(6):774. doi: 10.3390/biom12060774.
5
Investigating the impact of pre-processing techniques and pre-trained word embeddings in detecting Arabic health information on social media.研究预处理技术和预训练词嵌入在社交媒体上检测阿拉伯语健康信息方面的影响。
J Big Data. 2021;8(1):95. doi: 10.1186/s40537-021-00488-w. Epub 2021 Jul 2.
6
An efficient method for disaster tweets classification using gradient-based optimized convolutional neural networks with BERT embeddings.一种使用基于梯度优化的卷积神经网络与BERT嵌入的高效灾难推文分类方法。
MethodsX. 2024 Jul 3;13:102843. doi: 10.1016/j.mex.2024.102843. eCollection 2024 Dec.
7
Ensemble deep learning models for protein secondary structure prediction using bidirectional temporal convolution and bidirectional long short-term memory.使用双向时间卷积和双向长短期记忆的集成深度学习模型用于蛋白质二级结构预测。
Front Bioeng Biotechnol. 2023 Feb 13;11:1051268. doi: 10.3389/fbioe.2023.1051268. eCollection 2023.
8
Porter 6: Protein Secondary Structure Prediction by Leveraging Pre-Trained Language Models (PLMs).波特6:利用预训练语言模型(PLMs)进行蛋白质二级结构预测。
Int J Mol Sci. 2024 Dec 27;26(1):130. doi: 10.3390/ijms26010130.
9
Brain tumor segmentation and detection in MRI using convolutional neural networks and VGG16.使用卷积神经网络和VGG16在磁共振成像(MRI)中进行脑肿瘤分割与检测
Cancer Biomark. 2025 Mar;42(3):18758592241311184. doi: 10.1177/18758592241311184. Epub 2025 Apr 4.
10
Integrating Pre-Trained protein language model and multiple window scanning deep learning networks for accurate identification of secondary active transporters in membrane proteins.整合预训练蛋白质语言模型和多窗口扫描深度学习网络以准确识别膜蛋白中的次级主动转运体。
Methods. 2023 Dec;220:11-20. doi: 10.1016/j.ymeth.2023.10.008. Epub 2023 Oct 21.

本文引用的文献

1
AlphaFold Protein Structure Database in 2024: providing structure coverage for over 214 million protein sequences.2024 年的 AlphaFold 蛋白质结构数据库:为超过 2.14 亿个蛋白质序列提供结构覆盖。
Nucleic Acids Res. 2024 Jan 5;52(D1):D368-D375. doi: 10.1093/nar/gkad1011.
2
Critical assessment of methods of protein structure prediction (CASP)-Round XV.蛋白质结构预测方法的关键评估(CASP)-第十五轮。
Proteins. 2023 Dec;91(12):1539-1549. doi: 10.1002/prot.26617. Epub 2023 Nov 2.
3
Before and after AlphaFold2: An overview of protein structure prediction.
AlphaFold2 前后:蛋白质结构预测概述
Front Bioinform. 2023 Feb 28;3:1120370. doi: 10.3389/fbinf.2023.1120370. eCollection 2023.
4
Deep learning for protein secondary structure prediction: Pre and post-AlphaFold.用于蛋白质二级结构预测的深度学习:AlphaFold之前与之后。
Comput Struct Biotechnol J. 2022 Nov 11;20:6271-6286. doi: 10.1016/j.csbj.2022.11.012. eCollection 2022.
5
Metamorphic proteins under a computational microscope: Lessons from a fold-switching RfaH protein.计算显微镜下的变质蛋白:来自折叠转换RfaH蛋白的经验教训。
Comput Struct Biotechnol J. 2022 Oct 21;20:5824-5837. doi: 10.1016/j.csbj.2022.10.024. eCollection 2022.
6
Single-sequence protein structure prediction using a language model and deep learning.基于语言模型和深度学习的单序列蛋白质结构预测。
Nat Biotechnol. 2022 Nov;40(11):1617-1623. doi: 10.1038/s41587-022-01432-w. Epub 2022 Oct 3.
7
NetSurfP-3.0: accurate and fast prediction of protein structural features by protein language models and deep learning.NetSurfP-3.0:通过蛋白质语言模型和深度学习实现蛋白质结构特征的准确快速预测。
Nucleic Acids Res. 2022 Jul 5;50(W1):W510-W515. doi: 10.1093/nar/gkac439.
8
Protein language-model embeddings for fast, accurate, and alignment-free protein structure prediction.基于蛋白质语言模型的嵌入来实现快速、准确且无需对齐的蛋白质结构预测。
Structure. 2022 Aug 4;30(8):1169-1177.e4. doi: 10.1016/j.str.2022.05.001. Epub 2022 May 23.
9
SSpro/ACCpro 6: almost perfect prediction of protein secondary structure and relative solvent accessibility using profiles, deep learning and structural similarity.SSpro/ACCpro 6:使用轮廓、深度学习和结构相似性进行蛋白质二级结构和相对溶剂可及性的近乎完美预测。
Bioinformatics. 2022 Mar 28;38(7):2064-2065. doi: 10.1093/bioinformatics/btac019.
10
Accurate prediction of protein structures and interactions using a three-track neural network.使用三轨神经网络准确预测蛋白质结构和相互作用。
Science. 2021 Aug 20;373(6557):871-876. doi: 10.1126/science.abj8754. Epub 2021 Jul 15.