• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于基因本体论利用机器学习方法鉴定相分离蛋白相关功能

Identification of Phase-Separation-Protein-Related Function Based on Gene Ontology by Using Machine Learning Methods.

作者信息

Ma Qinglan, Huang FeiMing, Guo Wei, Feng KaiYan, Huang Tao, Cai Yudong

机构信息

School of Life Sciences, Shanghai University, Shanghai 200444, China.

Key Laboratory of Stem Cell Biology, Shanghai Jiao Tong University School of Medicine (SJTUSM) & Shanghai Institutes for Biological Sciences (SIBS), Chinese Academy of Sciences (CAS), Shanghai 200030, China.

出版信息

Life (Basel). 2023 May 31;13(6):1306. doi: 10.3390/life13061306.

DOI:10.3390/life13061306
PMID:37374089
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10300870/
Abstract

Phase-separation proteins (PSPs) are a class of proteins that play a role in the process of liquid-liquid phase separation, which is a mechanism that mediates the formation of membranelle compartments in cells. Identifying phase separation proteins and their associated function could provide insights into cellular biology and the development of diseases, such as neurodegenerative diseases and cancer. Here, PSPs and non-PSPs that have been experimentally validated in earlier studies were gathered as positive and negative samples. Each protein's corresponding Gene Ontology (GO) terms were extracted and used to create a 24,907-dimensional binary vector. The purpose was to extract essential GO terms that can describe essential functions of PSPs and build efficient classifiers to identify PSPs with these GO terms at the same time. To this end, the incremental feature selection computational framework and an integrated feature analysis scheme, containing categorical boosting, least absolute shrinkage and selection operator, light gradient-boosting machine, extreme gradient boosting, and permutation feature importance, were used to build efficient classifiers and identify GO terms with classification-related importance. A set of random forest (RF) classifiers with F1 scores over 0.960 were established to distinguish PSPs from non-PSPs. A number of GO terms that are crucial for distinguishing between PSPs and non-PSPs were found, including GO:0003723, which is related to a biological process involving RNA binding; GO:0016020, which is related to membrane formation; and GO:0045202, which is related to the function of synapses. This study offered recommendations for future research aimed at determining the functional roles of PSPs in cellular processes by developing efficient RF classifiers and identifying the representative GO terms related to PSPs.

摘要

相分离蛋白(PSPs)是一类在液-液相分离过程中发挥作用的蛋白质,液-液相分离是一种介导细胞内膜性小室形成的机制。识别相分离蛋白及其相关功能可为细胞生物学以及神经退行性疾病和癌症等疾病的发展提供见解。在这里,将早期研究中经过实验验证的相分离蛋白和非相分离蛋白收集为正样本和负样本。提取每个蛋白质对应的基因本体(GO)术语,并用于创建一个24907维的二元向量。目的是提取能够描述相分离蛋白基本功能的重要GO术语,并构建高效的分类器,以便同时利用这些GO术语识别相分离蛋白。为此,使用增量特征选择计算框架和一种集成特征分析方案,该方案包含分类提升、最小绝对收缩和选择算子、轻梯度提升机、极端梯度提升以及排列特征重要性,来构建高效的分类器并识别与分类相关重要性的GO术语。建立了一组F1分数超过0.960的随机森林(RF)分类器,以区分相分离蛋白和非相分离蛋白。发现了许多对于区分相分离蛋白和非相分离蛋白至关重要的GO术语,包括与涉及RNA结合的生物学过程相关的GO:0003723;与膜形成相关的GO:0016020;以及与突触功能相关的GO:0045202。本研究通过开发高效的随机森林分类器并识别与相分离蛋白相关的代表性GO术语,为未来旨在确定相分离蛋白在细胞过程中的功能作用的研究提供了建议。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b4c/10300870/e72eed72a4d0/life-13-01306-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b4c/10300870/f7c426bb97dd/life-13-01306-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b4c/10300870/435283f6231c/life-13-01306-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b4c/10300870/e72eed72a4d0/life-13-01306-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b4c/10300870/f7c426bb97dd/life-13-01306-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b4c/10300870/435283f6231c/life-13-01306-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4b4c/10300870/e72eed72a4d0/life-13-01306-g003.jpg

相似文献

1
Identification of Phase-Separation-Protein-Related Function Based on Gene Ontology by Using Machine Learning Methods.基于基因本体论利用机器学习方法鉴定相分离蛋白相关功能
Life (Basel). 2023 May 31;13(6):1306. doi: 10.3390/life13061306.
2
Prediction of liquid-liquid phase separating proteins using machine learning.利用机器学习预测液-液相分离蛋白。
BMC Bioinformatics. 2022 Feb 15;23(1):72. doi: 10.1186/s12859-022-04599-w.
3
GeNetOntology: identifying affected gene ontology terms via grouping, scoring, and modeling of gene expression data utilizing biological knowledge-based machine learning.基因本体论:通过利用基于生物知识的机器学习对基因表达数据进行分组、评分和建模来识别受影响的基因本体术语。
Front Genet. 2023 Aug 21;14:1139082. doi: 10.3389/fgene.2023.1139082. eCollection 2023.
4
SCMPSP: Prediction and characterization of photosynthetic proteins based on a scoring card method.SCMPSP:基于计分卡方法的光合蛋白预测与表征
BMC Bioinformatics. 2015;16 Suppl 1(Suppl 1):S8. doi: 10.1186/1471-2105-16-S1-S8. Epub 2015 Jan 21.
5
The use of Gene Ontology terms and KEGG pathways for analysis and prediction of oncogenes.利用基因本体论术语和KEGG通路进行癌基因的分析与预测。
Biochim Biophys Acta. 2016 Nov;1860(11 Pt B):2725-34. doi: 10.1016/j.bbagen.2016.01.012. Epub 2016 Jan 20.
6
Can Predictive Modeling Tools Identify Patients at High Risk of Prolonged Opioid Use After ACL Reconstruction?预测模型工具能否识别 ACL 重建术后阿片类药物使用时间延长的高风险患者?
Clin Orthop Relat Res. 2020 Jul;478(7):0-1618. doi: 10.1097/CORR.0000000000001251.
7
Identification of protein-protein interaction associated functions based on gene ontology and KEGG pathway.基于基因本体论和KEGG通路鉴定蛋白质-蛋白质相互作用相关功能
Front Genet. 2022 Sep 12;13:1011659. doi: 10.3389/fgene.2022.1011659. eCollection 2022.
8
Invasive ductal breast cancer molecular subtype prediction by MRI radiomic and clinical features based on machine learning.基于机器学习,通过MRI影像组学和临床特征预测浸润性导管乳腺癌分子亚型
Front Oncol. 2022 Sep 12;12:964605. doi: 10.3389/fonc.2022.964605. eCollection 2022.
9
PDRLGB: precise DNA-binding residue prediction using a light gradient boosting machine.PDRLGB:使用轻量级梯度提升机进行精确的 DNA 结合残基预测。
BMC Bioinformatics. 2018 Dec 31;19(Suppl 19):522. doi: 10.1186/s12859-018-2527-1.
10
Computational identification of prion-like RNA-binding proteins that form liquid phase-separated condensates.计算鉴定形成液-液相分离凝聚物的朊病毒样 RNA 结合蛋白。
Bioinformatics. 2019 Nov 1;35(22):4617-4623. doi: 10.1093/bioinformatics/btz274.

引用本文的文献

1
Protein Condensate Atlas from predictive models of heteromolecular condensate composition.蛋白质凝聚物图谱来自异源凝聚物组成的预测模型。
Nat Commun. 2024 Jul 10;15(1):5418. doi: 10.1038/s41467-024-48496-7.

本文引用的文献

1
A model with deep analysis on a large drug network for drug classification.一种用于药物分类的对大型药物网络进行深度分析的模型。
Math Biosci Eng. 2023 Jan;20(1):383-401. doi: 10.3934/mbe.2023018. Epub 2022 Oct 9.
2
Identification of Smoking-Associated Transcriptome Aberration in Blood with Machine Learning Methods.利用机器学习方法鉴定血液中与吸烟相关的转录组异常。
Biomed Res Int. 2023 Jan 4;2023:5333361. doi: 10.1155/2023/5333361. eCollection 2023.
3
Analysis and prediction of protein stability based on interaction network, gene ontology, and KEGG pathway enrichment scores.
基于相互作用网络、基因本体和KEGG通路富集分数的蛋白质稳定性分析与预测。
Biochim Biophys Acta Proteins Proteom. 2023 May 1;1871(3):140889. doi: 10.1016/j.bbapap.2023.140889. Epub 2023 Jan 4.
4
Liquid-Liquid Phase Separation of Biomacromolecules and Its Roles in Metabolic Diseases.生物大分子液-液相分离及其在代谢性疾病中的作用。
Cells. 2022 Sep 27;11(19):3023. doi: 10.3390/cells11193023.
5
Screening membraneless organelle participants with machine-learning models that integrate multimodal features.使用整合多模态特征的机器学习模型筛选无膜细胞器参与者。
Proc Natl Acad Sci U S A. 2022 Jun 14;119(24):e2115369119. doi: 10.1073/pnas.2115369119. Epub 2022 Jun 10.
6
Drug-Drug Interactions Prediction Using Fingerprint Only.仅使用指纹预测药物-药物相互作用。
Comput Math Methods Med. 2022 May 9;2022:7818480. doi: 10.1155/2022/7818480. eCollection 2022.
7
Prediction of liquid-liquid phase separating proteins using machine learning.利用机器学习预测液-液相分离蛋白。
BMC Bioinformatics. 2022 Feb 15;23(1):72. doi: 10.1186/s12859-022-04599-w.
8
Site-specific phosphorylation of PSD-95 dynamically regulates the postsynaptic density as observed by phase separation.如通过相分离所观察到的,PSD-95的位点特异性磷酸化动态调节突触后致密物。
iScience. 2021 Oct 14;24(11):103268. doi: 10.1016/j.isci.2021.103268. eCollection 2021 Nov 19.
9
Phase separation in RNA biology.RNA生物学中的相分离
J Genet Genomics. 2021 Oct 20;48(10):872-880. doi: 10.1016/j.jgg.2021.07.012. Epub 2021 Aug 8.
10
All Models are Wrong, but are Useful: Learning a Variable's Importance by Studying an Entire Class of Prediction Models Simultaneously.所有模型都是有缺陷的,但都是有用的:通过同时研究一整个类别的预测模型来了解变量的重要性。
J Mach Learn Res. 2019;20.