• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于蛋白质二级结构预测的轻量化 ProteinUnet2 网络:迈向正确评估的一步。

Lightweight ProteinUnet2 network for protein secondary structure prediction: a step towards proper evaluation.

机构信息

Department of Applied Informatics, Silesian University of Technology, Akademicka 16, 44-100, Gliwice, Poland.

Department of Bioinformatics and Telemedicine, Jagiellonian University Medical College, Medyczna 7, 30-688, Kraków, Poland.

出版信息

BMC Bioinformatics. 2022 Mar 22;23(1):100. doi: 10.1186/s12859-022-04623-z.

DOI:10.1186/s12859-022-04623-z
PMID:35317722
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8939211/
Abstract

BACKGROUND

The prediction of protein secondary structures is a crucial and significant step for ab initio tertiary structure prediction which delivers the information about proteins activity and functions. As the experimental methods are expensive and sometimes impossible, many SS predictors, mainly based on different machine learning methods have been proposed for many years. Currently, most of the top methods use evolutionary-based input features produced by PSSM and HHblits software, although quite recently the embeddings-the new description of protein sequences generated by language models (LM) have appeared that could be leveraged as input features. Apart from input features calculation, the top models usually need extensive computational resources for training and prediction and are barely possible to run on a regular PC. SS prediction as the imbalanced classification problem should not be judged by the commonly used Q3/Q8 metrics. Moreover, as the benchmark datasets are not random samples, the classical statistical null hypothesis testing based on the Neyman-Pearson approach is not appropriate.

RESULTS

We present a lightweight deep network ProteinUnet2 for SS prediction which is based on U-Net convolutional architecture and evolutionary-based input features (from PSSM and HHblits) as well as SPOT-Contact features. Through an extensive evaluation study, we report the performance of ProteinUnet2 in comparison with top SS prediction methods based on evolutionary information (SAINT and SPOT-1D). We also propose a new statistical methodology for prediction performance assessment based on the significance from Fisher-Pitman permutation tests accompanied by practical significance measured by Cohen's effect size.

CONCLUSIONS

Our results suggest that ProteinUnet2 architecture has much shorter training and inference times while maintaining results similar to SAINT and SPOT-1D predictors. Taking into account the relatively long times of calculating evolutionary-based features (from PSSM in particular), it would be worth conducting the predictive ability tests on embeddings as input features in the future. We strongly believe that our proposed here statistical methodology for the evaluation of SS prediction results will be adopted and used (and even expanded) by the research community.

摘要

背景

蛋白质二级结构预测是从头预测蛋白质三级结构的关键步骤,它提供了关于蛋白质活性和功能的信息。由于实验方法昂贵且有时不可行,多年来,许多主要基于不同机器学习方法的 SS 预测器已经被提出。目前,大多数顶级方法都使用基于进化的输入特征,这些特征是由 PSSM 和 HHblits 软件产生的,尽管最近出现了蛋白质序列的新描述,即语言模型(LM)生成的嵌入,可以作为输入特征加以利用。除了输入特征的计算之外,顶级模型通常需要大量的计算资源进行训练和预测,几乎不可能在普通 PC 上运行。SS 预测作为不平衡分类问题,不应该用常用的 Q3/Q8 指标来判断。此外,由于基准数据集不是随机样本,基于 Neyman-Pearson 方法的经典统计零假设检验并不适用。

结果

我们提出了一种轻量级的深度网络 ProteinUnet2,用于 SS 预测,它基于 U-Net 卷积架构和基于进化的输入特征(来自 PSSM 和 HHblits)以及 SPOT-Contact 特征。通过广泛的评估研究,我们报告了 ProteinUnet2 与基于进化信息(SAINT 和 SPOT-1D)的顶级 SS 预测方法的性能比较。我们还提出了一种新的统计方法,用于基于 Fisher-Pitman 置换检验的显著性评估,并结合 Cohen 的效应大小来衡量实际意义。

结论

我们的结果表明,ProteinUnet2 架构的训练和推断时间更短,同时保持与 SAINT 和 SPOT-1D 预测器相似的结果。考虑到计算进化基特征(特别是 PSSM)的时间相对较长,未来值得将嵌入作为输入特征进行预测能力测试。我们坚信,我们在这里提出的 SS 预测结果评估的统计方法将被研究社区采用和使用(甚至扩展)。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06f3/8939211/fd62d9b551b5/12859_2022_4623_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06f3/8939211/91c4375a9608/12859_2022_4623_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06f3/8939211/3032ec57752c/12859_2022_4623_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06f3/8939211/111241640aaa/12859_2022_4623_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06f3/8939211/94434c2db1a0/12859_2022_4623_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06f3/8939211/c88e64613bea/12859_2022_4623_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06f3/8939211/fd62d9b551b5/12859_2022_4623_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06f3/8939211/91c4375a9608/12859_2022_4623_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06f3/8939211/3032ec57752c/12859_2022_4623_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06f3/8939211/111241640aaa/12859_2022_4623_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06f3/8939211/94434c2db1a0/12859_2022_4623_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06f3/8939211/c88e64613bea/12859_2022_4623_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/06f3/8939211/fd62d9b551b5/12859_2022_4623_Fig6_HTML.jpg

相似文献

1
Lightweight ProteinUnet2 network for protein secondary structure prediction: a step towards proper evaluation.用于蛋白质二级结构预测的轻量化 ProteinUnet2 网络:迈向正确评估的一步。
BMC Bioinformatics. 2022 Mar 22;23(1):100. doi: 10.1186/s12859-022-04623-z.
2
Convolutional ProteinUnetLM competitive with long short-term memory-based protein secondary structure predictors.卷积蛋白UnetLM与基于长短期记忆的蛋白质二级结构预测器具有竞争力。
Proteins. 2023 May;91(5):608-618. doi: 10.1002/prot.26452. Epub 2022 Dec 5.
3
Modeling aspects of the language of life through transfer-learning protein sequences.通过转移学习蛋白质序列来模拟生命语言的各个方面。
BMC Bioinformatics. 2019 Dec 17;20(1):723. doi: 10.1186/s12859-019-3220-8.
4
SAINT: self-attention augmented inception-inside-inception network improves protein secondary structure prediction.SAINT:自注意力增强型 inception-inside-inception 网络提高蛋白质二级结构预测。
Bioinformatics. 2020 Nov 1;36(17):4599-4608. doi: 10.1093/bioinformatics/btaa531.
5
SVM-PB-Pred: SVM based protein block prediction method using sequence profiles and secondary structures.SVM-PB-Pred:基于支持向量机的蛋白质块预测方法,使用序列概况和二级结构。
Protein Pept Lett. 2014;21(8):736-42. doi: 10.2174/09298665113209990064.
6
Protein Secondary Structure Prediction Using Deep Convolutional Neural Fields.基于深度卷积神经场的蛋白质二级结构预测
Sci Rep. 2016 Jan 11;6:18962. doi: 10.1038/srep18962.
7
Real value prediction of protein solvent accessibility using enhanced PSSM features.使用增强的位置特异性得分矩阵(PSSM)特征对蛋白质溶剂可及性进行实际值预测。
BMC Bioinformatics. 2008 Dec 12;9 Suppl 12(Suppl 12):S12. doi: 10.1186/1471-2105-9-S12-S12.
8
PCI-SS: MISO dynamic nonlinear protein secondary structure prediction.PCI-SS:MISO动态非线性蛋白质二级结构预测
BMC Bioinformatics. 2009 Jul 17;10:222. doi: 10.1186/1471-2105-10-222.
9
MUFOLD-SS: New deep inception-inside-inception networks for protein secondary structure prediction.MUFOLD-SS:用于蛋白质二级结构预测的新深度 inception-inside-inception 网络。
Proteins. 2018 May;86(5):592-598. doi: 10.1002/prot.25487. Epub 2018 Mar 12.
10
Accurate contact predictions using covariation techniques and machine learning.使用共变技术和机器学习进行准确的接触预测。
Proteins. 2016 Sep;84 Suppl 1(Suppl Suppl 1):145-51. doi: 10.1002/prot.24863. Epub 2015 Aug 14.

引用本文的文献

1
Deep learning for protein secondary structure prediction: Pre and post-AlphaFold.用于蛋白质二级结构预测的深度学习:AlphaFold之前与之后。
Comput Struct Biotechnol J. 2022 Nov 11;20:6271-6286. doi: 10.1016/j.csbj.2022.11.012. eCollection 2022.

本文引用的文献

1
In Silico Modeling of the Influence of Environment on Amyloid Folding Using FOD-M Model.使用 FOD-M 模型对环境对淀粉样蛋白折叠影响的计算机模拟。
Int J Mol Sci. 2021 Sep 30;22(19):10587. doi: 10.3390/ijms221910587.
2
Highly accurate protein structure prediction with AlphaFold.利用 AlphaFold 进行高精度蛋白质结构预测。
Nature. 2021 Aug;596(7873):583-589. doi: 10.1038/s41586-021-03819-2. Epub 2021 Jul 15.
3
ProtTrans: Toward Understanding the Language of Life Through Self-Supervised Learning.ProtTrans:通过自监督学习理解生命语言。
IEEE Trans Pattern Anal Mach Intell. 2022 Oct;44(10):7112-7127. doi: 10.1109/TPAMI.2021.3095381. Epub 2022 Sep 14.
4
Learned Embeddings from Deep Learning to Visualize and Predict Protein Sets.深度学习提取的学习特征可用于可视化和预测蛋白质组。
Curr Protoc. 2021 May;1(5):e113. doi: 10.1002/cpz1.113.
5
Mol* Viewer: modern web app for 3D visualization and analysis of large biomolecular structures.Mol* Viewer:用于大型生物分子结构的 3D 可视化和分析的现代 Web 应用程序。
Nucleic Acids Res. 2021 Jul 2;49(W1):W431-W437. doi: 10.1093/nar/gkab314.
6
The language of proteins: NLP, machine learning & protein sequences.蛋白质的语言:自然语言处理、机器学习与蛋白质序列
Comput Struct Biotechnol J. 2021 Mar 25;19:1750-1758. doi: 10.1016/j.csbj.2021.03.022. eCollection 2021.
7
Biological structure and function emerge from scaling unsupervised learning to 250 million protein sequences.生物结构和功能源于将无监督学习扩展到 2.5 亿个蛋白质序列。
Proc Natl Acad Sci U S A. 2021 Apr 13;118(15). doi: 10.1073/pnas.2016239118.
8
nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation.nnU-Net:一种基于深度学习的生物医学图像分割的自配置方法。
Nat Methods. 2021 Feb;18(2):203-211. doi: 10.1038/s41592-020-01008-z. Epub 2020 Dec 7.
9
ProteinUnet-An efficient alternative to SPIDER3-single for sequence-based prediction of protein secondary structures.ProteinUnet—一种比 SPIDER3-single 更高效的基于序列的蛋白质二级结构预测方法。
J Comput Chem. 2021 Jan 5;42(1):50-59. doi: 10.1002/jcc.26432. Epub 2020 Oct 15.
10
SAINT: self-attention augmented inception-inside-inception network improves protein secondary structure prediction.SAINT:自注意力增强型 inception-inside-inception 网络提高蛋白质二级结构预测。
Bioinformatics. 2020 Nov 1;36(17):4599-4608. doi: 10.1093/bioinformatics/btaa531.