Suppr超能文献

利用功能嵌入特征鉴定小鼠体内蛋白质的功能

Identifying Functions of Proteins in Mice With Functional Embedding Features.

作者信息

Li Hao, Zhang ShiQi, Chen Lei, Pan Xiaoyong, Li ZhanDong, Huang Tao, Cai Yu-Dong

机构信息

College of Biological and Food Engineering, Jilin Engineering Normal University, Changchun, China.

Department of Biostatistics, University of Copenhagen, Copenhagen, Denmark.

出版信息

Front Genet. 2022 May 16;13:909040. doi: 10.3389/fgene.2022.909040. eCollection 2022.

Abstract

In current biology, exploring the biological functions of proteins is important. Given the large number of proteins in some organisms, exploring their functions one by one through traditional experiments is impossible. Therefore, developing quick and reliable methods for identifying protein functions is necessary. Considerable accumulation of protein knowledge and recent developments on computer science provide an alternative way to complete this task, that is, designing computational methods. Several efforts have been made in this field. Most previous methods have adopted the protein sequence features or directly used the linkage from a protein-protein interaction (PPI) network. In this study, we proposed some novel multi-label classifiers, which adopted new embedding features to represent proteins. These features were derived from functional domains and a PPI network via word embedding and network embedding, respectively. The minimum redundancy maximum relevance method was used to assess the features, generating a feature list. Incremental feature selection, incorporating RAndom k-labELsets to construct multi-label classifiers, used such list to construct two optimum classifiers, corresponding to two key measurements: accuracy and exact match. These two classifiers had good performance, and they were superior to classifiers that used features extracted by traditional methods.

摘要

在当代生物学中,探索蛋白质的生物学功能至关重要。鉴于某些生物体中蛋白质数量众多,通过传统实验逐一探索其功能是不可能的。因此,开发快速且可靠的蛋白质功能识别方法很有必要。蛋白质知识的大量积累以及计算机科学的最新进展提供了完成这项任务的另一种方式,即设计计算方法。在这一领域已经做出了一些努力。大多数先前的方法采用了蛋白质序列特征,或者直接利用蛋白质 - 蛋白质相互作用(PPI)网络中的联系。在本研究中,我们提出了一些新颖的多标签分类器,它们采用新的嵌入特征来表示蛋白质。这些特征分别通过词嵌入和网络嵌入从功能域和PPI网络中衍生而来。使用最小冗余最大相关性方法评估这些特征,生成一个特征列表。增量特征选择结合随机k标签集来构建多标签分类器,使用该列表构建两个最优分类器,分别对应两个关键度量:准确率和精确匹配。这两个分类器表现良好,并且优于使用传统方法提取的特征的分类器。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7dc7/9149260/30804341c959/fgene-13-909040-g001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验