利用功能嵌入特征鉴定小鼠体内蛋白质的功能

Identifying Functions of Proteins in Mice With Functional Embedding Features.

作者信息

Li Hao, Zhang ShiQi, Chen Lei, Pan Xiaoyong, Li ZhanDong, Huang Tao, Cai Yu-Dong

机构信息

College of Biological and Food Engineering, Jilin Engineering Normal University, Changchun, China.

Department of Biostatistics, University of Copenhagen, Copenhagen, Denmark.

出版信息

Front Genet. 2022 May 16;13:909040. doi: 10.3389/fgene.2022.909040. eCollection 2022.

DOI:10.3389/fgene.2022.909040

PMID:35651937

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9149260/

Abstract

In current biology, exploring the biological functions of proteins is important. Given the large number of proteins in some organisms, exploring their functions one by one through traditional experiments is impossible. Therefore, developing quick and reliable methods for identifying protein functions is necessary. Considerable accumulation of protein knowledge and recent developments on computer science provide an alternative way to complete this task, that is, designing computational methods. Several efforts have been made in this field. Most previous methods have adopted the protein sequence features or directly used the linkage from a protein-protein interaction (PPI) network. In this study, we proposed some novel multi-label classifiers, which adopted new embedding features to represent proteins. These features were derived from functional domains and a PPI network via word embedding and network embedding, respectively. The minimum redundancy maximum relevance method was used to assess the features, generating a feature list. Incremental feature selection, incorporating RAndom k-labELsets to construct multi-label classifiers, used such list to construct two optimum classifiers, corresponding to two key measurements: accuracy and exact match. These two classifiers had good performance, and they were superior to classifiers that used features extracted by traditional methods.

摘要

在当代生物学中，探索蛋白质的生物学功能至关重要。鉴于某些生物体中蛋白质数量众多，通过传统实验逐一探索其功能是不可能的。因此，开发快速且可靠的蛋白质功能识别方法很有必要。蛋白质知识的大量积累以及计算机科学的最新进展提供了完成这项任务的另一种方式，即设计计算方法。在这一领域已经做出了一些努力。大多数先前的方法采用了蛋白质序列特征，或者直接利用蛋白质 - 蛋白质相互作用（PPI）网络中的联系。在本研究中，我们提出了一些新颖的多标签分类器，它们采用新的嵌入特征来表示蛋白质。这些特征分别通过词嵌入和网络嵌入从功能域和PPI网络中衍生而来。使用最小冗余最大相关性方法评估这些特征，生成一个特征列表。增量特征选择结合随机k标签集来构建多标签分类器，使用该列表构建两个最优分类器，分别对应两个关键度量：准确率和精确匹配。这两个分类器表现良好，并且优于使用传统方法提取的特征的分类器。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7dc7/9149260/30804341c959/fgene-13-909040-g001.jpg

相似文献

Identifying Functions of Proteins in Mice With Functional Embedding Features.利用功能嵌入特征鉴定小鼠体内蛋白质的功能

Front Genet. 2022 May 16;13:909040. doi: 10.3389/fgene.2022.909040. eCollection 2022.

Predicting gene phenotype by multi-label multi-class model based on essential functional features.基于必需功能特征的多标签多类模型预测基因表型。

Mol Genet Genomics. 2021 Jul;296(4):905-918. doi: 10.1007/s00438-021-01789-8. Epub 2021 Apr 29.

Improved multi-label classifiers for predicting protein subcellular localization.改进的多标签分类器用于预测蛋白质亚细胞定位。

Math Biosci Eng. 2024 Jan;21(1):214-236. doi: 10.3934/mbe.2024010. Epub 2022 Dec 11.

Identification of protein functions in mouse with a label space partition method.用标签空间划分方法鉴定小鼠中的蛋白质功能。

Math Biosci Eng. 2022 Feb 10;19(4):3820-3842. doi: 10.3934/mbe.2022176.

iATC-NRAKEL: an efficient multi-label classifier for recognizing anatomical therapeutic chemical classes of drugs.iATC-NRAKEL：一种用于识别药物解剖治疗化学类别的高效多标签分类器。

Bioinformatics. 2020 Mar 1;36(5):1391-1396. doi: 10.1093/bioinformatics/btz757.

Prediction of Drug Combinations with a Network Embedding Method.基于网络嵌入方法的药物组合预测

Comb Chem High Throughput Screen. 2018;21(10):789-797. doi: 10.2174/1386207322666181226170140.

Computer-assisted lip diagnosis on Traditional Chinese Medicine using multi-class support vector machines.基于多类支持向量机的中医唇诊计算机辅助诊断。

BMC Complement Altern Med. 2012 Aug 16;12:127. doi: 10.1186/1472-6882-12-127.

iMPTCE-Hnetwork: A Multilabel Classifier for Identifying Metabolic Pathway Types of Chemicals and Enzymes with a Heterogeneous Network.iMPTCE-Hnetwork：一种基于异构网络的用于识别化学物质和酶代谢途径类型的多标签分类器。

Comput Math Methods Med. 2021 Jan 4;2021:6683051. doi: 10.1155/2021/6683051. eCollection 2021.

Drug Target Group Prediction with Multiple Drug Networks.基于多个药物网络的药物靶标群组预测。

Comb Chem High Throughput Screen. 2020;23(4):274-284. doi: 10.2174/1386207322666190702103927.

PredictEFC: a fast and efficient multi-label classifier for predicting enzyme family classes.PredictEFC：一种用于预测酶家族类别的快速高效的多标签分类器。

BMC Bioinformatics. 2024 Jan 30;25(1):50. doi: 10.1186/s12859-024-05665-1.

引用本文的文献

Protein Sequence Analysis landscape: A Systematic Review of Task Types, Databases, Datasets, Word Embeddings Methods, and Language Models.蛋白质序列分析全景：任务类型、数据库、数据集、词嵌入方法和语言模型的系统综述

Database (Oxford). 2025 May 30;2025. doi: 10.1093/database/baaf027.

Identification of key gene expression associated with quality of life after recovery from COVID-19.鉴定与 COVID-19 康复后生活质量相关的关键基因表达。

Med Biol Eng Comput. 2024 Apr;62(4):1031-1048. doi: 10.1007/s11517-023-02988-8. Epub 2023 Dec 21.

Identification of Colon Immune Cell Marker Genes Using Machine Learning Methods.使用机器学习方法鉴定结肠免疫细胞标记基因

Life (Basel). 2023 Sep 7;13(9):1876. doi: 10.3390/life13091876.

Identification of Gene Markers Associated with COVID-19 Severity and Recovery in Different Immune Cell Subtypes.不同免疫细胞亚型中与COVID-19严重程度和恢复相关的基因标志物的鉴定

Biology (Basel). 2023 Jul 2;12(7):947. doi: 10.3390/biology12070947.

Immune responses of different COVID-19 vaccination strategies by analyzing single-cell RNA sequencing data from multiple tissues using machine learning methods.通过使用机器学习方法分析来自多个组织的单细胞RNA测序数据，研究不同新冠疫苗接种策略的免疫反应。

Front Genet. 2023 Mar 17;14:1157305. doi: 10.3389/fgene.2023.1157305. eCollection 2023.

本文引用的文献

Comput Math Methods Med. 2022 Apr 1;2022:9547317. doi: 10.1155/2022/9547317. eCollection 2022.

Exploring the Genomic Patterns in Human and Mouse Cerebellums Via Single-Cell Sequencing and Machine Learning Method.通过单细胞测序和机器学习方法探索人类和小鼠小脑的基因组模式。

Front Genet. 2022 Mar 4;13:857851. doi: 10.3389/fgene.2022.857851. eCollection 2022.

Predicting Heart Cell Types by Using Transcriptome Profiles and a Machine Learning Method.利用转录组图谱和机器学习方法预测心脏细胞类型

Life (Basel). 2022 Jan 31;12(2):228. doi: 10.3390/life12020228.

Predicting RNA 5-Methylcytosine Sites by Using Essential Sequence Features and Distributions.基于关键序列特征和分布预测 RNA 5-甲基胞嘧啶位点

Biomed Res Int. 2022 Jan 13;2022:4035462. doi: 10.1155/2022/4035462. eCollection 2022.

iMPT-FDNPL: Identification of Membrane Protein Types with Functional Domains and a Natural Language Processing Approach.iMPT-FDNPL：基于功能域和自然语言处理方法识别膜蛋白类型。

Comput Math Methods Med. 2021 Oct 11;2021:7681497. doi: 10.1155/2021/7681497. eCollection 2021.

Genomic Island Prediction via Chi-Square Test and Random Forest Algorithm.基于卡方检验和随机森林算法的基因组岛预测。

Comput Math Methods Med. 2021 May 24;2021:9969751. doi: 10.1155/2021/9969751. eCollection 2021.

Using Recursive Feature Selection with Random Forest to Improve Protein Structural Class Prediction for Low-Similarity Sequences.使用递归特征选择和随机森林提高低相似度序列的蛋白质结构分类预测。

Comput Math Methods Med. 2021 May 7;2021:5529389. doi: 10.1155/2021/5529389. eCollection 2021.

NetGO 2.0: improving large-scale protein function prediction with massive sequence, text, domain, family and network information.NetGO 2.0：利用大规模的序列、文本、结构域、家族和网络信息提高大规模蛋白质功能预测。

Nucleic Acids Res. 2021 Jul 2;49(W1):W469-W475. doi: 10.1093/nar/gkab398.

Identifying Protein Subcellular Locations With Embeddings-Based node2loc.基于嵌入的 node2loc 识别蛋白亚细胞位置

IEEE/ACM Trans Comput Biol Bioinform. 2022 Mar-Apr;19(2):666-675. doi: 10.1109/TCBB.2021.3080386. Epub 2022 Apr 1.

Identification of Protein Subcellular Localization With Network and Functional Embeddings.利用网络和功能嵌入识别蛋白质亚细胞定位

Front Genet. 2021 Jan 20;11:626500. doi: 10.3389/fgene.2020.626500. eCollection 2020.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用功能嵌入特征鉴定小鼠体内蛋白质的功能

Identifying Functions of Proteins in Mice With Functional Embedding Features.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献