DeepAdd：基于 k -mer 嵌入和附加特征的蛋白质功能预测。

DeepAdd: Protein function prediction from k-mer embedding and additional features.

机构信息

Guangdong Laboratory of Artificial Intelligence and Digital Economy (SZ), Shenzhen University, Guangdong Province, PR China.

出版信息

Comput Biol Chem. 2020 Dec;89:107379. doi: 10.1016/j.compbiolchem.2020.107379. Epub 2020 Sep 23.

DOI:10.1016/j.compbiolchem.2020.107379

PMID:33011616

Abstract

With the application of new high throughput sequencing technology, a large number of protein sequences is becoming available. Determination of the functional characteristics of these proteins by experiments is an expensive endeavor that requires a lot of time. Furthermore, at the organismal level, such kind of experimental functional analyses can be conducted only for a very few selected model organisms. Computational function prediction methods can be used to fill this gap. The functions of proteins are classified by Gene Ontology (GO), which contains more than 40,000 classifications in three domains, Molecular Function (MF), Biological Process (BP), and Cellular Component (CC). Additionally, since proteins have many functions, function prediction represents a multi-label and multi-class problem. We developed a new method to predict protein function from sequence. To this end, natural language model was used to generate word embedding of sequence and learn features from it by deep learning, and additional features to locate every protein. Our method uses the dependencies between GO classes as background information to construct a deep learning model. We evaluate our method using the standards established by the Computational Assessment of Function Annotation (CAFA) and have noticeable improvement over several algorithms, such as FFPred, DeepGO, GoFDR and other methods compared on the CAFA3 datasets.

摘要

随着高通量测序技术的应用，大量的蛋白质序列变得可用。通过实验来确定这些蛋白质的功能特性是一项昂贵的工作，需要大量的时间。此外，在生物体水平上，这种实验功能分析只能在极少数选定的模式生物中进行。计算功能预测方法可以用来填补这一空白。蛋白质的功能是通过基因本体论 (GO) 分类的，GO 包含三个领域（分子功能 (MF)、生物过程 (BP) 和细胞成分 (CC)）的 40000 多个分类。此外，由于蛋白质具有许多功能，功能预测代表了一个多标签和多类问题。我们开发了一种从序列预测蛋白质功能的新方法。为此，我们使用自然语言模型生成序列的词嵌入，并通过深度学习从其中学习特征，并为每个蛋白质定位额外的特征。我们的方法使用 GO 类之间的依赖关系作为背景信息来构建深度学习模型。我们使用由计算功能注释评估 (CAFA) 建立的标准来评估我们的方法，与 FFPred、DeepGO、GoFDR 等方法相比，在 CAFA3 数据集上有显著的改进。

相似文献

DeepAdd: Protein function prediction from k-mer embedding and additional features.DeepAdd：基于 k -mer 嵌入和附加特征的蛋白质功能预测。

Comput Biol Chem. 2020 Dec;89:107379. doi: 10.1016/j.compbiolchem.2020.107379. Epub 2020 Sep 23.

DeepGO: predicting protein functions from sequence and interactions using a deep ontology-aware classifier.DeepGO：使用深度本体感知分类器从序列和相互作用预测蛋白质功能。

Bioinformatics. 2018 Feb 15;34(4):660-668. doi: 10.1093/bioinformatics/btx624.

A Deep Learning Framework for Gene Ontology Annotations With Sequence- and Network-Based Information.基于序列和网络信息的基因本体论注释深度学习框架。

IEEE/ACM Trans Comput Biol Bioinform. 2021 Nov-Dec;18(6):2208-2217. doi: 10.1109/TCBB.2020.2968882. Epub 2021 Dec 8.

Deep_CNN_LSTM_GO: Protein function prediction from amino-acid sequences.深度卷积神经网络-长短期记忆网络-广义输出：从氨基酸序列预测蛋白质功能。

Comput Biol Chem. 2021 Dec;95:107584. doi: 10.1016/j.compbiolchem.2021.107584. Epub 2021 Sep 24.

DeepFunc: A Deep Learning Framework for Accurate Prediction of Protein Functions from Protein Sequences and Interactions.DeepFunc：一种从蛋白质序列和相互作用中准确预测蛋白质功能的深度学习框架。

Proteomics. 2019 Jun;19(12):e1900019. doi: 10.1002/pmic.201900019. Epub 2019 May 27.

Mutual annotation-based prediction of protein domain functions with Domain2GO.基于互注释的蛋白质结构域功能预测与 Domain2GO。

Protein Sci. 2024 Jun;33(6):e4988. doi: 10.1002/pro.4988.

ProLanGO: Protein Function Prediction Using Neural Machine Translation Based on a Recurrent Neural Network.ProLanGO：基于循环神经网络的神经机器翻译在蛋白质功能预测中的应用。

Molecules. 2017 Oct 17;22(10):1732. doi: 10.3390/molecules22101732.

FunPredCATH: An ensemble method for predicting protein function using CATH.FunPredCATH：一种使用 CATH 预测蛋白质功能的集成方法。

Biochim Biophys Acta Proteins Proteom. 2024 Feb 1;1872(2):140985. doi: 10.1016/j.bbapap.2023.140985. Epub 2023 Dec 19.

Protein function prediction using guilty by association from interaction networks.利用相互作用网络中的关联有罪推断进行蛋白质功能预测。

Amino Acids. 2015 Dec;47(12):2583-92. doi: 10.1007/s00726-015-2049-3. Epub 2015 Jul 28.

Protein function prediction from protein-protein interaction network using gene ontology based neighborhood analysis and physico-chemical features.基于基因本体的邻域分析和物理化学特征，从蛋白质-蛋白质相互作用网络预测蛋白质功能。

J Bioinform Comput Biol. 2018 Dec;16(6):1850025. doi: 10.1142/S0219720018500257. Epub 2018 Sep 19.

引用本文的文献

HPOseq: a deep ensemble model for predicting the protein-phenotype relationships based on protein sequences.HPOseq：一种基于蛋白质序列预测蛋白质-表型关系的深度集成模型。

BMC Bioinformatics. 2025 Apr 22;26(1):110. doi: 10.1186/s12859-025-06122-3.

AI Prediction of Structural Stability of Nanoproteins Based on Structures and Residue Properties by Mean Pooled Dual Graph Convolutional Network.基于平均池化双图卷积网络的结构和残基特性对纳米蛋白质结构稳定性的人工智能预测

Interdiscip Sci. 2025 Mar;17(1):101-113. doi: 10.1007/s12539-024-00662-7. Epub 2024 Oct 5.

A survey of k-mer methods and applications in bioinformatics.生物信息学中k-mer方法及其应用综述。

Comput Struct Biotechnol J. 2024 May 21;23:2289-2303. doi: 10.1016/j.csbj.2024.05.025. eCollection 2024 Dec.

Proteomic Barcoding Platform for Macromolecular Screening and Delivery.蛋白质组条码平台用于大分子筛选和递呈。

J Proteome Res. 2024 Jun 7;23(6):2067-2077. doi: 10.1021/acs.jproteome.4c00068. Epub 2024 May 22.

Elucidating the functional roles of prokaryotic proteins using big data and artificial intelligence.利用大数据和人工智能阐明原核蛋白的功能作用。

FEMS Microbiol Rev. 2023 Jan 16;47(1). doi: 10.1093/femsre/fuad003.

Protein Science Meets Artificial Intelligence: A Systematic Review and a Biochemical Meta-Analysis of an Inter-Field.蛋白质科学与人工智能相遇：跨领域的系统评价与生化荟萃分析

Front Bioeng Biotechnol. 2022 Jul 7;10:788300. doi: 10.3389/fbioe.2022.788300. eCollection 2022.

Protein function prediction with gene ontology: from traditional to deep learning models.利用基因本体进行蛋白质功能预测：从传统模型到深度学习模型

PeerJ. 2021 Aug 24;9:e12019. doi: 10.7717/peerj.12019. eCollection 2021.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

DeepAdd：基于 k -mer 嵌入和附加特征的蛋白质功能预测。

DeepAdd: Protein function prediction from k-mer embedding and additional features.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献