Suppr超能文献

pLMMoRF:一个通过使用蛋白质语言模型准确预测膜相互作用分子识别特征的网络服务器。

pLMMoRF: A Web Server That Accurately Predicts Membrane-interacting Molecular Recognition Features by Employing a Protein Language Model.

作者信息

Csepi Máté, Berta Blanka, Basu Sushmita, Kurgan Lukasz, Hegedűs Tamás

机构信息

Department of Biophysics and Radiation Biology, Semmelweis University, Budapest H-1094, Hungary.

Department of Computer Science, Virginia Commonwealth University, Virginia, VA 23284-3068, USA.

出版信息

J Mol Biol. 2025 Sep 1;437(17):169236. doi: 10.1016/j.jmb.2025.169236. Epub 2025 May 27.

Abstract

Interactions between proteins and lipids are crucial for numerous cellular processes. Some of the lipid interacting segments in protein sequences are intrinsically disordered regions (IDRs), which may gain secondary structures upon binding. We collected experimentally annotated lipid-interacting IDRs, named membrane molecular recognition features (MemMoRFs). We used this dataset to develop and test an accurate and relatively fast sequence-based MemMoRF predictor, pLMMoRF, thereby supporting tedious and costly experimental identification of MemMoRFs. Our predictor utilizes a protein language model (pLM) which we processed to generate inputs to a deep convolutional neural network. We considered various pLMs (ESM-2, ProstT5, ProtT5 and Ankh) and applied feature selection to reduce their outputs, creating a more compact neural network model. pLMMoRF leverages the Ankh-based model, selected for its higher accuracy compared to our other models. Tests on low similarity test datasets demonstrate that pLMMoRF is more accurate than the sole current predictor of MemMoRFs, CoMemMoRFPred. Moreover, pLMMoRF has a relatively small computational footprint because of the compact network size and use of dedicated GPU nodes. This allowed us to make MemMoRF predictions for the human proteome. We analyzed these predictions and made them publicly available, facilitating an improved understanding of functions of membrane-coupled proteins. Our work underscores the importance of selecting key embedding features to enhance predictive performance and reduce computational footprint of sequence-based predictors of protein functions. The web server for the pLMMoRF predictor and the predictions for human proteins are freely available at https://plmmorf.hegelab.org.

摘要

蛋白质与脂质之间的相互作用对众多细胞过程至关重要。蛋白质序列中一些与脂质相互作用的片段是内在无序区域(IDR),它们在结合时可能会获得二级结构。我们收集了经过实验注释的与脂质相互作用的IDR,命名为膜分子识别特征(MemMoRF)。我们使用这个数据集来开发和测试一个准确且相对快速的基于序列的MemMoRF预测器pLMMoRF,从而支持对MemMoRF进行繁琐且成本高昂的实验鉴定。我们的预测器利用了一种蛋白质语言模型(pLM),我们对其进行处理以生成深度卷积神经网络的输入。我们考虑了各种pLM(ESM - 2、ProstT5、ProtT5和Ankh),并应用特征选择来减少它们的输出,创建一个更紧凑的神经网络模型。pLMMoRF利用基于Ankh的模型,因其与我们的其他模型相比具有更高的准确性而被选中。在低相似性测试数据集上的测试表明,pLMMoRF比当前唯一的MemMoRF预测器CoMemMoRFPred更准确。此外,由于网络规模紧凑且使用了专用GPU节点,pLMMoRF的计算量相对较小。这使我们能够对人类蛋白质组进行MemMoRF预测。我们分析了这些预测结果并将其公开,以促进对膜偶联蛋白功能的更好理解。我们的工作强调了选择关键嵌入特征以提高基于序列的蛋白质功能预测器的预测性能和减少计算量的重要性。pLMMoRF预测器的网络服务器以及人类蛋白质的预测结果可在https://plmmorf.hegelab.org免费获取。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验