Suppr超能文献

基于基因本体论利用机器学习方法鉴定相分离蛋白相关功能

Identification of Phase-Separation-Protein-Related Function Based on Gene Ontology by Using Machine Learning Methods.

作者信息

Ma Qinglan, Huang FeiMing, Guo Wei, Feng KaiYan, Huang Tao, Cai Yudong

机构信息

School of Life Sciences, Shanghai University, Shanghai 200444, China.

Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) & Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai 200030, China.

出版信息

Life (Basel). 2023 May 31;13(6):1306. doi: 10.3390/life13061306.

Abstract

Phase-separation proteins (PSPs) are a class of proteins that play a role in the process of liquid-liquid phase separation, which is a mechanism that mediates the formation of membranelle compartments in cells. Identifying phase separation proteins and their associated function could provide insights into cellular biology and the development of diseases, such as neurodegenerative diseases and cancer. Here, PSPs and non-PSPs that have been experimentally validated in earlier studies were gathered as positive and negative samples. Each protein's corresponding Gene Ontology (GO) terms were extracted and used to create a 24,907-dimensional binary vector. The purpose was to extract essential GO terms that can describe essential functions of PSPs and build efficient classifiers to identify PSPs with these GO terms at the same time. To this end, the incremental feature selection computational framework and an integrated feature analysis scheme, containing categorical boosting, least absolute shrinkage and selection operator, light gradient-boosting machine, extreme gradient boosting, and permutation feature importance, were used to build efficient classifiers and identify GO terms with classification-related importance. A set of random forest (RF) classifiers with F1 scores over 0.960 were established to distinguish PSPs from non-PSPs. A number of GO terms that are crucial for distinguishing between PSPs and non-PSPs were found, including GO:0003723, which is related to a biological process involving RNA binding; GO:0016020, which is related to membrane formation; and GO:0045202, which is related to the function of synapses. This study offered recommendations for future research aimed at determining the functional roles of PSPs in cellular processes by developing efficient RF classifiers and identifying the representative GO terms related to PSPs.

摘要

相分离蛋白(PSPs)是一类在液-液相分离过程中发挥作用的蛋白质,液-液相分离是一种介导细胞内膜性小室形成的机制。识别相分离蛋白及其相关功能可为细胞生物学以及神经退行性疾病和癌症等疾病的发展提供见解。在这里,将早期研究中经过实验验证的相分离蛋白和非相分离蛋白收集为正样本和负样本。提取每个蛋白质对应的基因本体(GO)术语,并用于创建一个24907维的二元向量。目的是提取能够描述相分离蛋白基本功能的重要GO术语,并构建高效的分类器,以便同时利用这些GO术语识别相分离蛋白。为此,使用增量特征选择计算框架和一种集成特征分析方案,该方案包含分类提升、最小绝对收缩和选择算子、轻梯度提升机、极端梯度提升以及排列特征重要性,来构建高效的分类器并识别与分类相关重要性的GO术语。建立了一组F1分数超过0.960的随机森林(RF)分类器,以区分相分离蛋白和非相分离蛋白。发现了许多对于区分相分离蛋白和非相分离蛋白至关重要的GO术语,包括与涉及RNA结合的生物学过程相关的GO:0003723;与膜形成相关的GO:0016020;以及与突触功能相关的GO:0045202。本研究通过开发高效的随机森林分类器并识别与相分离蛋白相关的代表性GO术语,为未来旨在确定相分离蛋白在细胞过程中的功能作用的研究提供了建议。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b4c/10300870/f7c426bb97dd/life-13-01306-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验