Fossat Martin J
Max-Planck-Institut für Immunbiologie und Epigenetik (MPI-IE), Stübeweg 51, 79108 Freiburg im Breisgau, Germany.
J Chem Inf Model. 2025 Jan 27;65(2):873-881. doi: 10.1021/acs.jcim.4c01860. Epub 2025 Jan 16.
Intrinsically disordered regions are found in most eukaryotic proteins and are enriched with positively and negatively charged residues. While it is often convenient to assume that these residues follow their model-compound p values, recent work has shown that local charge effects (charge regulation) can upshift or downshift side chain p values with major consequences for molecular function. Despite this, charge regulation is rarely considered when investigating disordered regions. The number of potential charge microstates that can be populated through acid/base regulation of a given number of ionizable residues in a sequence, , scales as ∼2. This exponential scaling makes the assessment of the full charge landscape of most proteins computationally intractable. To address this problem, we developed "multisite extent of deprotonation originating from context" (MEDOC) to determine the degree of protonation of a protein based on the local sequence context of each ionizable residue. We show that we can drastically reduce the number of parameters necessary to determine the full, analytical Boltzmann partition function of the charge landscape at both global and site-specific levels. Our algorithm applies the structure of the -canonical ensemble, combined with novel strategies to rapidly obtain the minimal set of parameters, thereby circumventing the combinatorial explosion of the number of charge microstates even for proteins containing a large number of ionizable amino acids. We apply MEDOC to several sequences, including a global analysis of the distribution of p values across the entire DisProt database. Our results show differences in the distribution of predicted p values for different amino acids and good agreement with NMR-measured p values in proteins.
内在无序区域存在于大多数真核生物蛋白质中,并且富含带正电和负电的残基。虽然通常方便地假设这些残基遵循其模型化合物的p值,但最近的研究表明,局部电荷效应(电荷调节)可以使侧链p值上调或下调,对分子功能产生重大影响。尽管如此,在研究无序区域时很少考虑电荷调节。通过对序列中给定数量的可电离残基进行酸碱调节可形成的潜在电荷微状态数,,按比例缩放为~2。这种指数缩放使得评估大多数蛋白质的完整电荷态势在计算上难以处理。为了解决这个问题,我们开发了“源自上下文的去质子化多部位程度”(MEDOC),以根据每个可电离残基的局部序列上下文来确定蛋白质的质子化程度。我们表明,我们可以大幅减少在全局和位点特异性水平上确定电荷态势的完整解析玻尔兹曼配分函数所需的参数数量。我们的算法应用了 - 正则系综的结构,并结合新颖的策略快速获得最小参数集,从而即使对于包含大量可电离氨基酸的蛋白质,也能规避电荷微状态数量的组合爆炸。我们将MEDOC应用于多个序列,包括对整个DisProt数据库中p值分布的全局分析。我们的结果显示了不同氨基酸预测p值分布的差异,并且与蛋白质中NMR测量的p值有很好的一致性。