Department of Biomedical Informatics, School of Basic Medical Sciences, Peking University Health Science Center, Beijing 100191, China.
Beijing Advanced Innovation Center for Structural Biology, Beijing Frontier Research Center for Biological Structure, School of Life Sciences, Tsinghua University, Beijing 100084, China.
Proc Natl Acad Sci U S A. 2022 Jun 14;119(24):e2115369119. doi: 10.1073/pnas.2115369119. Epub 2022 Jun 10.
Protein self-assembly is one of the formation mechanisms of biomolecular condensates. However, most phase-separating systems (PS) demand multiple partners in biological conditions. In this study, we divided PS proteins into two groups according to the mechanism by which they undergo PS: PS-Self proteins can self-assemble spontaneously to form droplets, while PS-Part proteins interact with partners to undergo PS. Analysis of the amino acid composition revealed differences in the sequence pattern between the two protein groups. Existing PS predictors, when evaluated on two test protein sets, preferentially predicted self-assembling proteins. Thus, a comprehensive predictor is required. Herein, we propose that properties other than sequence composition can provide crucial information in screening PS proteins. By incorporating phosphorylation frequencies and immunofluorescence image-based droplet-forming propensity with other PS-related features, we built two independent machine-learning models to separately predict the two protein categories. Results of independent testing suggested the superiority of integrating multimodal features. We performed experimental verification on the top-scored proteins DHX9, -67, and NIFK. Their PS behavior in vitro revealed the effectiveness of our models in PS prediction. Further validation on the proteome of membraneless organelles confirmed the ability of our models to identify PS-Part proteins. We implemented a web server named PhaSePred (http://predict.phasep.pro/) that incorporates our two models together with representative PS predictors. PhaSePred displays proteome-level quantiles of different features, thus profiling PS propensity and providing crucial information for identification of candidate proteins.
蛋白质自组装是生物分子凝聚物形成的机制之一。然而,大多数相分离系统(PS)在生物条件下需要多个伴侣。在这项研究中,我们根据 PS 蛋白经历相分离的机制将其分为两组:PS-Self 蛋白可以自发组装形成液滴,而 PS-Part 蛋白则与伴侣相互作用经历相分离。氨基酸组成分析揭示了这两组蛋白在序列模式上的差异。现有的 PS 预测器在两个测试蛋白集上的评估结果表明,它们更倾向于预测自我组装的蛋白。因此,需要一个全面的预测器。在这里,我们提出,除了序列组成之外的性质可以在筛选 PS 蛋白时提供关键信息。通过将磷酸化频率和基于免疫荧光图像的液滴形成倾向与其他 PS 相关特征相结合,我们构建了两个独立的机器学习模型,分别预测这两个蛋白类别。独立测试的结果表明,整合多模态特征具有优越性。我们对 DHX9、-67 和 NIFK 这三种得分最高的蛋白进行了实验验证。它们在体外的 PS 行为证明了我们模型在 PS 预测中的有效性。对无膜细胞器蛋白质组的进一步验证证实了我们的模型能够识别 PS-Part 蛋白。我们实现了一个名为 PhaSePred(http://predict.phasep.pro/)的网络服务器,该服务器整合了我们的两个模型以及具有代表性的 PS 预测器。PhaSePred 显示了不同特征的蛋白质组水平分位数,从而分析了 PS 倾向,并为鉴定候选蛋白提供了关键信息。