Kilgore Henry R, Chinn Itamar, Mikhael Peter G, Mitnikov Ilan, Van Dongen Catherine, Zylberberg Guy, Afeyan Lena, Banani Salman F, Wilson-Hawken Susana, Lee Tong Ihn, Barzilay Regina, Young Richard A
Whitehead Institute for Biomedical Research, Cambridge, MA, USA.
Computer Science and Artificial Intelligence Laboratory, Massachusetts Institute of Technology (MIT), Cambridge, MA, USA.
Science. 2025 Mar 7;387(6738):1095-1101. doi: 10.1126/science.adq2634. Epub 2025 Feb 6.
Cells have evolved mechanisms to distribute ~10 billion protein molecules to subcellular compartments where diverse proteins involved in shared functions must assemble. In this study, we demonstrate that proteins with shared functions share amino acid sequence codes that guide them to compartment destinations. We developed a protein language model, ProtGPS, that predicts with high performance the compartment localization of human proteins excluded from the training set. ProtGPS successfully guided generation of novel protein sequences that selectively assemble in the nucleolus. ProtGPS identified pathological mutations that change this code and lead to altered subcellular localization of proteins. Our results indicate that protein sequences contain not only a folding code but also a previously unrecognized code governing their distribution to diverse subcellular compartments.
细胞已经进化出各种机制,以便将约100亿个蛋白质分子分配到亚细胞区室中,在这些区室中,参与共同功能的各种蛋白质必须组装在一起。在本研究中,我们证明具有共同功能的蛋白质共享氨基酸序列编码,这些编码引导它们到达区室目的地。我们开发了一种蛋白质语言模型ProtGPS,它能够高效预测排除在训练集之外的人类蛋白质的区室定位。ProtGPS成功地指导了选择性地在核仁中组装的新型蛋白质序列的生成。ProtGPS识别出改变这种编码并导致蛋白质亚细胞定位改变的病理性突变。我们的结果表明,蛋白质序列不仅包含折叠编码,还包含一种以前未被认识到的控制它们向不同亚细胞区室分布的编码。