Molecular Medicine Program, Hospital for Sick Children, Toronto, ON M5G 0A4, Canada.
Department of Biochemistry, University of Toronto, Toronto, ON M5S 1A8, Canada.
Biomolecules. 2022 Aug 17;12(8):1131. doi: 10.3390/biom12081131.
Protein phase separation is increasingly understood to be an important mechanism of biological organization and biomaterial formation. Intrinsically disordered protein regions (IDRs) are often significant drivers of protein phase separation. A number of protein phase-separation-prediction algorithms are available, with many being specific for particular classes of proteins and others providing results that are not amenable to the interpretation of the contributing biophysical interactions. Here, we describe LLPhyScore, a new predictor of IDR-driven phase separation, based on a broad set of physical interactions or features. LLPhyScore uses sequence-based statistics from the RCSB PDB database of folded structures for these interactions, and is trained on a manually curated set of phase-separation-driving proteins with different negative training sets including the PDB and human proteome. Competitive training for a variety of physical chemical interactions shows the greatest contribution of solvent contacts, disorder, hydrogen bonds, pi-pi contacts, and kinked beta-structures to the score, with electrostatics, cation-pi contacts, and the absence of a helical secondary structure also contributing. LLPhyScore has strong phase-separation-prediction recall statistics and enables a breakdown of the contribution from each physical feature to a sequence's phase-separation propensity, while recognizing the interdependence of many of these features. The tool should be a valuable resource for guiding experiments and providing hypotheses for protein function in normal and pathological states, as well as for understanding how specificity emerges in defining individual biomolecular condensates.
蛋白质相分离越来越被认为是生物组织和生物材料形成的重要机制。无序蛋白质区域(IDR)通常是蛋白质相分离的重要驱动因素。有许多蛋白质相分离预测算法,其中许多是针对特定类别的蛋白质,而其他算法则提供的结果不易解释其贡献的生物物理相互作用。在这里,我们描述了 LLPhyScore,这是一种新的 IDR 驱动相分离预测器,基于广泛的物理相互作用或特征。LLPhyScore 使用来自折叠结构 RCSB PDB 数据库的基于序列的统计信息来进行这些相互作用的预测,并使用不同的负训练集(包括 PDB 和人类蛋白质组)对经过人工编辑的相分离驱动蛋白集进行训练。针对各种物理化学相互作用的竞争训练表明,溶剂接触、无序、氢键、pi-pi 接触和弯曲的β-结构对得分的贡献最大,而静电、阳离子-π 接触和缺乏螺旋二级结构也有贡献。LLPhyScore 具有很强的相分离预测召回统计数据,并能够分解每个物理特征对序列相分离倾向的贡献,同时认识到许多这些特征的相互依存性。该工具应该是一个有价值的资源,可以指导实验,并为正常和病理状态下的蛋白质功能提供假设,以及帮助理解特异性如何在定义单个生物分子凝聚物方面出现。