Suppr超能文献

PSTP:利用蛋白质构象和语言模型嵌入进行准确的残基水平相分离预测。

PSTP: accurate residue-level phase separation prediction using protein conformational and language model embeddings.

作者信息

Feng Mofan, Liu Liangjie, Xian Zhuo-Ning, Wei Xiaoxi, Li Keyi, Yan Wenqian, Lu Qing, Shi Yi, He Guang

机构信息

Bio-X Institutes, Key Laboratory for the Genetics of Developmental and Neuropsychiatric Disorders, Shanghai Jiao Tong University, No. 1954 Huashan Road, Xuhui District, Shanghai 200030, China.

Shanghai Institute of Medical Genetics, Shanghai Children's Hospital, Shanghai Jiao Tong University School of Medicine, No. 24 Lane 1400 West Beijing Road, Jing'an District, Shanghai 200040, China.

出版信息

Brief Bioinform. 2025 May 1;26(3). doi: 10.1093/bib/bbaf171.

Abstract

Phase separation (PS) is essential in cellular processes and disease mechanisms, highlighting the need for predictive algorithms to analyze uncharacterized sequences and accelerate experimental validation. Current high-accuracy methods often rely on extensive annotations or handcrafted features, limiting their generalizability to sequences lacking such annotations and making it difficult to identify key protein regions involved in PS. We introduce Phase Separation's Transfer-learning Prediction (PSTP), which combines conformational embeddings with large language model embeddings, enabling state-of-the-art PS predictions from protein sequences alone. PSTP performs well across various prediction scenarios and shows potential for predicting novel-designed artificial proteins. Additionally, PSTP provides residue-level predictions that are highly correlated with experimentally validated PS regions. By analyzing 160 000+ variants, PSTP characterizes the strong link between the incidence of pathogenic variants and residue-level PS propensities in unconserved intrinsically disordered regions, offering insights into underexplored mutation effects. PSTP's sliding-window optimization reduces its memory usage to a few hundred megabytes, facilitating rapid execution on typical CPUs and GPUs. Offered via both a web server and an installable Python package, PSTP provides a versatile tool for decoding protein PS behavior and supporting disease-focused research.

摘要

相分离(PS)在细胞过程和疾病机制中至关重要,这凸显了需要预测算法来分析未表征的序列并加速实验验证。当前的高精度方法通常依赖于广泛的注释或手工制作的特征,限制了它们对缺乏此类注释的序列的通用性,并且难以识别参与相分离的关键蛋白质区域。我们引入了相分离的迁移学习预测(PSTP),它将构象嵌入与大语言模型嵌入相结合,仅从蛋白质序列就能实现最先进的相分离预测。PSTP在各种预测场景中表现良好,并显示出预测新设计的人工蛋白质的潜力。此外,PSTP提供与实验验证的相分离区域高度相关的残基水平预测。通过分析超过160000个变体,PSTP表征了致病性变体的发生率与非保守内在无序区域中残基水平的相分离倾向之间的紧密联系,为未充分探索的突变效应提供了见解。PSTP的滑动窗口优化将其内存使用量减少到几百兆字节,便于在典型的CPU和GPU上快速执行。通过网络服务器和可安装的Python包提供,PSTP为解码蛋白质相分离行为和支持以疾病为重点的研究提供了一个多功能工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6466/12047702/8b5d853ff195/bbaf171ga1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验