DEEPred：基于多任务前馈深度神经网络的蛋白质自动功能预测。

DEEPred: Automated Protein Function Prediction with Multi-task Feed-forward Deep Neural Networks.

机构信息

Department of Computer Engineering, METU, Ankara, 06800, Turkey.

Department of Computer Engineering, İskenderun Technical University, Hatay, 31200, Turkey.

出版信息

Sci Rep. 2019 May 14;9(1):7344. doi: 10.1038/s41598-019-43708-3.

DOI:10.1038/s41598-019-43708-3

PMID:31089211

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6517386/

Abstract

Automated protein function prediction is critical for the annotation of uncharacterized protein sequences, where accurate prediction methods are still required. Recently, deep learning based methods have outperformed conventional algorithms in computer vision and natural language processing due to the prevention of overfitting and efficient training. Here, we propose DEEPred, a hierarchical stack of multi-task feed-forward deep neural networks, as a solution to Gene Ontology (GO) based protein function prediction. DEEPred was optimized through rigorous hyper-parameter tests, and benchmarked using three types of protein descriptors, training datasets with varying sizes and GO terms form different levels. Furthermore, in order to explore how training with larger but potentially noisy data would change the performance, electronically made GO annotations were also included in the training process. The overall predictive performance of DEEPred was assessed using CAFA2 and CAFA3 challenge datasets, in comparison with the state-of-the-art protein function prediction methods. Finally, we evaluated selected novel annotations produced by DEEPred with a literature-based case study considering the 'biofilm formation process' in Pseudomonas aeruginosa. This study reports that deep learning algorithms have significant potential in protein function prediction; particularly when the source data is large. The neural network architecture of DEEPred can also be applied to the prediction of the other types of ontological associations. The source code and all datasets used in this study are available at: https://github.com/cansyl/DEEPred .

摘要

自动化蛋白质功能预测对于未被研究的蛋白质序列的注释至关重要，而准确的预测方法仍有待开发。最近，基于深度学习的方法在计算机视觉和自然语言处理方面已经超越了传统算法，因为它们可以防止过拟合并进行有效的训练。在这里，我们提出了 DEEPred，这是一种基于层次堆栈的多任务前馈深度神经网络，用于进行基于基因本体论（GO）的蛋白质功能预测。DEEPred 通过严格的超参数测试进行了优化，并使用三种类型的蛋白质描述符、具有不同大小的训练数据集和来自不同层次的 GO 术语进行了基准测试。此外，为了探索使用更大但可能存在噪声的数据进行训练会如何改变性能，我们还将电子生成的 GO 注释纳入了训练过程。我们使用 CAFA2 和 CAFA3 挑战数据集来评估 DEEPred 的整体预测性能，并与最先进的蛋白质功能预测方法进行了比较。最后，我们考虑了铜绿假单胞菌中的“生物膜形成过程”，通过基于文献的案例研究评估了 DEEPred 生成的选定新注释。这项研究表明，深度学习算法在蛋白质功能预测方面具有很大的潜力；特别是在源数据较大的情况下。DEEPred 的神经网络架构也可以应用于其他类型的本体关联的预测。本研究中使用的源代码和所有数据集均可在：https://github.com/cansyl/DEEPred 获得。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/72e9/6517386/4349fdb0a42c/41598_2019_43708_Fig1_HTML.jpg

相似文献

DEEPred: Automated Protein Function Prediction with Multi-task Feed-forward Deep Neural Networks.DEEPred：基于多任务前馈深度神经网络的蛋白质自动功能预测。

Sci Rep. 2019 May 14;9(1):7344. doi: 10.1038/s41598-019-43708-3.

Hierarchical deep learning for predicting GO annotations by integrating protein knowledge.基于蛋白质知识的 GO 注释预测的分层深度学习

Bioinformatics. 2022 Sep 30;38(19):4488-4496. doi: 10.1093/bioinformatics/btac536.

Mutual annotation-based prediction of protein domain functions with Domain2GO.基于互注释的蛋白质结构域功能预测与 Domain2GO。

Protein Sci. 2024 Jun;33(6):e4988. doi: 10.1002/pro.4988.

Transfer learning for drug-target interaction prediction.药物-靶标相互作用预测的迁移学习。

Bioinformatics. 2023 Jun 30;39(39 Suppl 1):i103-i110. doi: 10.1093/bioinformatics/btad234.

Accurate identification of RNA editing sites from primitive sequence with deep neural networks.利用深度神经网络从原始序列中准确识别 RNA 编辑位点。

Sci Rep. 2018 Apr 16;8(1):6005. doi: 10.1038/s41598-018-24298-y.

Deepred-Mt: Deep representation learning for predicting C-to-U RNA editing in plant mitochondria.Deepred-Mt：用于预测植物线粒体 C 到 U RNA 编辑的深度表示学习。

Comput Biol Med. 2021 Sep;136:104682. doi: 10.1016/j.compbiomed.2021.104682. Epub 2021 Jul 27.

TEC-miTarget: enhancing microRNA target prediction based on deep learning of ribonucleic acid sequences.TEC-miTarget：基于 RNA 序列深度学习的 miRNA 靶基因预测增强方法。

BMC Bioinformatics. 2024 Apr 20;25(1):159. doi: 10.1186/s12859-024-05780-z.

DeePred-BBB: A Blood Brain Barrier Permeability Prediction Model With Improved Accuracy.DeePred-BBB：一种具有更高准确率的血脑屏障通透性预测模型。

Front Neurosci. 2022 May 3;16:858126. doi: 10.3389/fnins.2022.858126. eCollection 2022.

Optimizing neural networks for medical data sets: A case study on neonatal apnea prediction.优化神经网络在医学数据集上的应用：以新生儿呼吸暂停预测为例的研究

Artif Intell Med. 2019 Jul;98:59-76. doi: 10.1016/j.artmed.2019.07.008. Epub 2019 Jul 25.

HPO2GO: prediction of human phenotype ontology term associations for proteins using cross ontology annotation co-occurrences.HPO2GO：利用交叉本体注释共现情况预测蛋白质的人类表型本体术语关联

PeerJ. 2018 Aug 2;6:e5298. doi: 10.7717/peerj.5298. eCollection 2018.

引用本文的文献

ODDM: Integration of SMOTE Tomek with Deep Learning on Imbalanced Color Fundus Images for Classification of Several Ocular Diseases.ODDM：在不平衡的彩色眼底图像上集成SMOTE Tomek与深度学习以对多种眼部疾病进行分类

J Imaging. 2025 Aug 18;11(8):278. doi: 10.3390/jimaging11080278.

ProT-GFDM: A generative fractional diffusion model for protein generation.ProT-GFDM：一种用于蛋白质生成的生成式分数扩散模型。

Comput Struct Biotechnol J. 2025 Aug 5;27:3464-3480. doi: 10.1016/j.csbj.2025.07.045. eCollection 2025.

A Survey of Biological Function Prediction Methods with Focus on Natural Language Processing (NLP) and Large Language Models (LLM).以自然语言处理（NLP）和大语言模型（LLM）为重点的生物功能预测方法综述。

Methods Mol Biol. 2025;2941:201-225. doi: 10.1007/978-1-0716-4623-6_13.

DeepES: deep learning-based enzyme screening to identify orphan enzyme genes.DeepES：基于深度学习的酶筛选以鉴定孤儿酶基因。

Bioinformatics. 2025 Mar 4;41(3). doi: 10.1093/bioinformatics/btaf053.

pACP-HybDeep: predicting anticancer peptides using binary tree growth based transformer and structural feature encoding with deep-hybrid learning.pACP-HybDeep：基于二叉树生长的变压器和深度混合学习的结构特征编码预测抗癌肽

Sci Rep. 2025 Jan 2;15(1):565. doi: 10.1038/s41598-024-84146-0.

An experimental analysis of graph representation learning for Gene Ontology based protein function prediction.基于基因本体论的蛋白质功能预测的图表示学习的实验分析。

PeerJ. 2024 Nov 14;12:e18509. doi: 10.7717/peerj.18509. eCollection 2024.

Exploring protein natural diversity in environmental microbiomes with DeepMetagenome.用 DeepMetagenome 探索环境微生物组中的蛋白质自然多样性。

Cell Rep Methods. 2024 Nov 18;4(11):100896. doi: 10.1016/j.crmeth.2024.100896. Epub 2024 Nov 7.

Deep learning model for protein multi-label subcellular localization and function prediction based on multi-task collaborative training.基于多任务协同训练的蛋白质多标签亚细胞定位和功能预测深度学习模型。

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae568.

Optimizing protein sequence classification: integrating deep learning models with Bayesian optimization for enhanced biological analysis.优化蛋白质序列分类：将深度学习模型与贝叶斯优化相结合，以增强生物分析。

BMC Med Inform Decis Mak. 2024 Aug 27;24(1):236. doi: 10.1186/s12911-024-02631-y.

PANDA-3D: protein function prediction based on AlphaFold models.PANDA-3D：基于AlphaFold模型的蛋白质功能预测

NAR Genom Bioinform. 2024 Aug 6;6(3):lqae094. doi: 10.1093/nargab/lqae094. eCollection 2024 Sep.

本文引用的文献

Predicting human protein function with multi-task deep neural networks.用多任务深度神经网络预测人类蛋白质功能。

PLoS One. 2018 Jun 11;13(6):e0198216. doi: 10.1371/journal.pone.0198216. eCollection 2018.

deepNF: deep network fusion for protein function prediction.深度网络融合的蛋白质功能预测。

Bioinformatics. 2018 Nov 15;34(22):3873-3881. doi: 10.1093/bioinformatics/bty440.

SECLAF: a webserver and deep neural network design tool for hierarchical biological sequence classification.SECLAF：一个用于分层生物序列分类的网络服务器和深度神经网络设计工具。

Bioinformatics. 2018 Jul 15;34(14):2487-2489. doi: 10.1093/bioinformatics/bty116.

Surface Sensing for Biofilm Formation in .用于生物膜形成的表面传感于……中（原文不完整，翻译可能不太准确，需结合完整文本进一步完善）

Front Microbiol. 2018 Jan 9;8:2671. doi: 10.3389/fmicb.2017.02671. eCollection 2017.

Large-scale automated function prediction of protein sequences and an experimental case study validation on PTEN transcript variants.蛋白质序列的大规模自动化功能预测及PTEN转录变体的实验案例研究验证

Proteins. 2018 Feb;86(2):135-151. doi: 10.1002/prot.25416. Epub 2017 Nov 29.

ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network.ProLanGO：基于循环神经网络的神经机器翻译在蛋白质功能预测中的应用。

Molecules. 2017 Oct 17;22(10):1732. doi: 10.3390/molecules22101732.

DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier.DeepGO：使用深度本体感知分类器从序列和相互作用预测蛋白质功能。

Bioinformatics. 2018 Feb 15;34(4):660-668. doi: 10.1093/bioinformatics/btx624.

Protein Function Prediction Using Deep Restricted Boltzmann Machines.使用深度受限玻尔兹曼机进行蛋白质功能预测。

Biomed Res Int. 2017;2017:1729301. doi: 10.1155/2017/1729301. Epub 2017 Jun 28.

Low Data Drug Discovery with One-Shot Learning.基于一次性学习的低数据药物发现

ACS Cent Sci. 2017 Apr 26;3(4):283-293. doi: 10.1021/acscentsci.6b00367. Epub 2017 Apr 3.

Deep learning for computational chemistry.用于计算化学的深度学习

J Comput Chem. 2017 Jun 15;38(16):1291-1307. doi: 10.1002/jcc.24764. Epub 2017 Mar 8.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

DEEPred：基于多任务前馈深度神经网络的蛋白质自动功能预测。

DEEPred: Automated Protein Function Prediction with Multi-task Feed-forward Deep Neural Networks.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献