Suppr超能文献

FusOn-pLM:一种通过调整速率掩码的融合癌蛋白特异性语言模型。

FusOn-pLM: a fusion oncoprotein-specific language model via adjusted rate masking.

作者信息

Vincoff Sophia, Goel Shrey, Kholina Kseniia, Pulugurta Rishab, Vure Pranay, Chatterjee Pranam

机构信息

Department of Biomedical Engineering, Duke University, Durham, NC, USA.

Department of Computer Science, Duke University, Durham, NC, USA.

出版信息

Nat Commun. 2025 Feb 7;16(1):1436. doi: 10.1038/s41467-025-56745-6.

Abstract

Fusion oncoproteins, a class of chimeric proteins arising from chromosomal translocations, are major drivers of various pediatric cancers. These proteins are intrinsically disordered and lack druggable pockets, making them highly challenging therapeutic targets for both small molecule-based and structure-based approaches. Protein language models (pLMs) have recently emerged as powerful tools for capturing physicochemical and functional protein features but have yet to be trained on fusion oncoprotein sequences. We introduce FusOn-pLM, a fine-tuned pLM trained on a newly curated, comprehensive set of fusion oncoprotein sequences, FusOn-DB. Employing a unique cosine-scheduled masked language modeling strategy, FusOn-pLM dynamically adjusts masking rates (15%-40%) to optimize feature extraction and representation quality, surpassing baseline embeddings in fusion-specific tasks, including localization, puncta formation, and disorder prediction. FusOn-pLM uniquely predicts drug-resistant mutations, providing insights for therapeutic design that anticipates resistance mechanisms. In total, FusOn-pLM provides biologically relevant representations for advancing therapeutic discovery in fusion-driven cancers.

摘要

融合癌蛋白是一类由染色体易位产生的嵌合蛋白,是多种儿童癌症的主要驱动因素。这些蛋白本质上是无序的,缺乏可成药口袋,这使得它们对于基于小分子和基于结构的方法而言都是极具挑战性的治疗靶点。蛋白质语言模型(pLMs)最近已成为捕获蛋白质物理化学和功能特征的强大工具,但尚未在融合癌蛋白序列上进行训练。我们引入了FusOn-pLM,这是一种在新策划的、全面的融合癌蛋白序列集FusOn-DB上训练的微调pLM。采用独特的余弦调度掩码语言建模策略,FusOn-pLM动态调整掩码率(15%-40%)以优化特征提取和表示质量,在融合特异性任务(包括定位、斑点形成和无序预测)中超越基线嵌入。FusOn-pLM独特地预测耐药突变,为预测耐药机制的治疗设计提供见解。总的来说,FusOn-pLM为推进融合驱动癌症的治疗发现提供了生物学相关的表示。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/cc17/11806025/ab6c31b14a95/41467_2025_56745_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验