Suppr超能文献

一种基于人工智能的方法用于识别调节液-液相分离的蛋白质。

An artificial intelligence-based approach for identifying the proteins regulating liquid-liquid phase separation.

作者信息

Ahmed Zahoor, Shahzadi Kiran, Li Rui, Jiang Yu-Qing, Jin Yan-Ting, Arif Muhammad, Feng Juan

机构信息

The Clinical Hospital of Chengdu Brain Science Institute, School of Life Science and Technology, University of Electronic Science and Technology of China, Chengdu, 611731 Sichuan, China.

School of Medical Technology and Information Engineering, Zhejiang Chinese Medical University, Hangzhou, 310053 Zhejiang, China.

出版信息

Brief Bioinform. 2025 Jul 2;26(4). doi: 10.1093/bib/bbaf313.

Abstract

Liquid-liquid phase separation (LLPS) is a biomolecular process that underpins the formation of membrane-less organelles within living cells. This phenomenon, along with the resulting condensate bodies, is increasingly recognized for its critical roles in various biological processes, such as ribonucleic acid (RNA) metabolism, chromatin rearrangement, and signal transduction. Notably, regulator proteins play a central role in the process of LLPS. They are essential for the formation, stabilization, and maintenance of the dynamic properties of LLPS, ensuring an appropriate phase separation response to cellular signals. Targeting these regulator proteins is the key to manipulating LLPS for applications in biotechnology, materials science, and medicine, including biomaterials, drug delivery, diagnostics, and synthetic biology. Given their importance, this study focused on an artificial intelligence-based approach to identify regulator proteins in LLPS. We constructed a dataset of 913 positive and 6584 negative protein sequences, and divided it into eight balanced training datasets and a test dataset. Semantic information from protein sequences was extracted using the ESM2_t36 pretrained protein language model, followed by training a multilayer perceptron classifier. The model achieved 0.78 accuracy on the test dataset, outperforming traditional sequence-based methods, one-hot encoding, and other pretrained embedding methods. SHapley Additive exPlanations (SHAP)-based interpretation revealed key biophysical patterns enriched in regulator proteins, including higher levels of charged and disordered residues. Our results show that deep contextual protein representations combined with neural network-based classifiers can accurately identify LLPS regulator proteins. This tool offers new opportunities for understanding condensate biology and designing synthetic phase-separating systems. All data and code are available at: https://github.com/bioplusAI/LLPS_regulators_pred.

摘要

液-液相分离(LLPS)是一种生物分子过程,它是活细胞内无膜细胞器形成的基础。这种现象以及由此产生的凝聚体,因其在各种生物过程中的关键作用而越来越受到认可,这些生物过程包括核糖核酸(RNA)代谢、染色质重排和信号转导。值得注意的是,调节蛋白在LLPS过程中起着核心作用。它们对于LLPS动态特性的形成、稳定和维持至关重要,确保对细胞信号有适当的相分离反应。靶向这些调节蛋白是在生物技术、材料科学和医学(包括生物材料、药物递送、诊断和合成生物学)中操纵LLPS应用的关键。鉴于它们的重要性,本研究专注于一种基于人工智能的方法来识别LLPS中的调节蛋白。我们构建了一个包含913个阳性和6584个阴性蛋白质序列的数据集,并将其分为八个平衡的训练数据集和一个测试数据集。使用ESM2_t36预训练的蛋白质语言模型提取蛋白质序列的语义信息,然后训练一个多层感知器分类器。该模型在测试数据集上的准确率达到0.78,优于传统的基于序列的方法、独热编码和其他预训练嵌入方法。基于SHapley加性解释(SHAP)的解释揭示了调节蛋白中富集的关键生物物理模式,包括更高水平的带电和无序残基。我们的结果表明,深度上下文蛋白质表征与基于神经网络的分类器相结合可以准确识别LLPS调节蛋白。这个工具为理解凝聚体生物学和设计合成相分离系统提供了新机会。所有数据和代码可在以下网址获取:https://github.com/bioplusAI/LLPS_regulators_pred

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bb8c/12239617/a986359a319a/bbaf313ga1.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验