• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

MoRF_ESM:基于深度变压器蛋白质语言模型预测无序蛋白质中的分子识别特征片段

MoRF_ESM: Prediction of MoRFs in disordered proteins based on a deep transformer protein language model.

作者信息

Fang Chun, He Jiasheng, Yamana Hayato

机构信息

Department of Information Engineering, Beijing Institute of Petrochemical Technology, 19 Qingyuan North Road, Daxing District, Beijing 102617, P. R. China.

Department of Computer Science and Engineering, Waseda University, 3-4-1 Okubo, Shinjuku, Tokyo 169-8555, Japan.

出版信息

J Bioinform Comput Biol. 2024 Apr;22(2):2450006. doi: 10.1142/S0219720024500069. Epub 2024 May 28.

DOI:10.1142/S0219720024500069
PMID:38812466
Abstract

Molecular recognition features (MoRFs) are particular functional segments of disordered proteins, which play crucial roles in regulating the phase transition of membrane-less organelles and frequently serve as central sites in cellular interaction networks. As the association between disordered proteins and severe diseases continues to be discovered, identifying MoRFs has gained growing significance. Due to the limited number of experimentally validated MoRFs, the performance of existing MoRF's prediction algorithms is not good enough and still needs to be improved. In this research, we present a model named MoRF_ESM, which utilizes deep-learning protein representations to predict MoRFs in disordered proteins. This approach employs a pretrained ESM-2 protein language model to generate embedding representations of residues in the form of attention map matrices. These representations are combined with a self-learned TextCNN model for feature extraction and prediction. In addition, an averaging step was incorporated at the end of the MoRF_ESM model to refine the output and generate final prediction results. In comparison to other impressive methods on benchmark datasets, the MoRF_ESM approach demonstrates state-of-the-art performance, achieving [Formula: see text] higher AUC than other methods when tested on TEST1 and achieving [Formula: see text] higher AUC than other methods when tested on TEST2. These results imply that the combination of ESM-2 and TextCNN can effectively extract deep evolutionary features related to protein structure and function, along with capturing shallow pattern features located in protein sequences, and is well qualified for the prediction task of MoRFs. Given that ESM-2 is a highly versatile protein language model, the methodology proposed in this study can be readily applied to other tasks involving the classification of protein sequences.

摘要

分子识别特征(MoRFs)是无序蛋白质的特定功能片段,在调节无膜细胞器的相变中起关键作用,并经常作为细胞相互作用网络的中心位点。随着无序蛋白质与严重疾病之间的关联不断被发现,识别MoRFs的重要性日益增加。由于实验验证的MoRFs数量有限,现有MoRF预测算法的性能还不够好,仍需改进。在本研究中,我们提出了一个名为MoRF_ESM的模型,该模型利用深度学习蛋白质表示来预测无序蛋白质中的MoRFs。这种方法采用预训练的ESM-2蛋白质语言模型,以注意力图矩阵的形式生成残基的嵌入表示。这些表示与自学习的TextCNN模型相结合,用于特征提取和预测。此外,在MoRF_ESM模型的末尾加入了一个平均步骤,以优化输出并生成最终预测结果。与基准数据集上的其他出色方法相比,MoRF_ESM方法展示了领先的性能,在TEST1上测试时比其他方法的AUC高[公式:见原文],在TEST2上测试时比其他方法的AUC高[公式:见原文]。这些结果表明,ESM-2和TextCNN的结合可以有效地提取与蛋白质结构和功能相关的深度进化特征,同时捕捉位于蛋白质序列中的浅层模式特征,并且非常适合MoRFs的预测任务。鉴于ESM-2是一个高度通用的蛋白质语言模型,本研究中提出的方法可以很容易地应用于其他涉及蛋白质序列分类的任务。

相似文献

1
MoRF_ESM: Prediction of MoRFs in disordered proteins based on a deep transformer protein language model.MoRF_ESM:基于深度变压器蛋白质语言模型预测无序蛋白质中的分子识别特征片段
J Bioinform Comput Biol. 2024 Apr;22(2):2450006. doi: 10.1142/S0219720024500069. Epub 2024 May 28.
2
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
3
Systemic treatments for metastatic cutaneous melanoma.转移性皮肤黑色素瘤的全身治疗
Cochrane Database Syst Rev. 2018 Feb 6;2(2):CD011123. doi: 10.1002/14651858.CD011123.pub2.
4
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.
5
Trajectory-Ordered Objectives for Self-Supervised Representation Learning of Temporal Healthcare Data Using Transformers: Model Development and Evaluation Study.使用Transformer进行时间序列医疗数据自监督表示学习的轨迹有序目标:模型开发与评估研究
JMIR Med Inform. 2025 Jun 4;13:e68138. doi: 10.2196/68138.
6
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗:一项网状荟萃分析。
Cochrane Database Syst Rev. 2017 Dec 22;12(12):CD011535. doi: 10.1002/14651858.CD011535.pub2.
7
A deep learning approach to direct immunofluorescence pattern recognition in autoimmune bullous diseases.深度学习方法在自身免疫性大疱性疾病中的直接免疫荧光模式识别。
Br J Dermatol. 2024 Jul 16;191(2):261-266. doi: 10.1093/bjd/ljae142.
8
An augmented transformer model trained on protein family specific variant data leads to improved prediction of variants of uncertain significance.在蛋白质家族特异性变异数据上训练的增强型变压器模型可提高对意义未明变异的预测能力。
Hum Genet. 2025 Mar;144(2-3):143-158. doi: 10.1007/s00439-025-02727-z. Epub 2025 Jan 27.
9
EORTC guidelines for the use of erythropoietic proteins in anaemic patients with cancer: 2006 update.欧洲癌症研究与治疗组织(EORTC)癌症贫血患者促红细胞生成蛋白使用指南:2006年更新版
Eur J Cancer. 2007 Jan;43(2):258-70. doi: 10.1016/j.ejca.2006.10.014. Epub 2006 Dec 19.
10
A rapid and systematic review of the clinical effectiveness and cost-effectiveness of paclitaxel, docetaxel, gemcitabine and vinorelbine in non-small-cell lung cancer.对紫杉醇、多西他赛、吉西他滨和长春瑞滨在非小细胞肺癌中的临床疗效和成本效益进行的快速系统评价。
Health Technol Assess. 2001;5(32):1-195. doi: 10.3310/hta5320.