Molecular Biophysics Unit, Indian Institute of Science, Bangalore, Karnataka 560012, India.
J Chem Theory Comput. 2024 Nov 26;20(22):10179-10198. doi: 10.1021/acs.jctc.4c00579. Epub 2024 Nov 5.
Multidomain proteins with long flexible linkers and full-length intrinsically disordered proteins (IDPs) are best defined as an ensemble of conformations rather than a single structure. Determining high-resolution ensemble structures of such proteins poses various challenges by using tools from experimental structural biophysics. Integrative approaches combining available low-resolution ensemble-averaged experimental data and in silico biomolecular reconstructions are now often used for the purpose. However, extensive Boltzmann weighted conformation sampling for large proteins, especially for ones where both the folded and disordered domains exist in the same polypeptide chain, remains a challenge. In this work, we present a 2-site per amino-acid resolution SOP-MULTI force field for simulating coarse-grained models of multidomain proteins. SOP-MULTI combines two well-established self-organized polymer models─: (i) SOP-SC models for folded systems and (ii) SOP-IDP for IDPs. For the SOP-MULTI, we introduce cross-interaction terms between the beads belonging to the folded and disordered regions to generate conformation ensembles for full-length multidomain proteins such as hnRNP A1, TDP-43, G3BP1, hGHR-ECD, TIA1, HIV-1 Gag, polyubiquitin, and FUS. When back-mapped to all-atom resolution, SOP-MULTI trajectories faithfully recapitulate the scattering data over the range of the reciprocal space. We also show that individual folded domains preserve native contacts with respect to solved folded structures, and root-mean-square fluctuations of residues in folded domains match those obtained from all-atom molecular dynamics simulation trajectories of the same folded systems. SOP-MULTI force field is made available as a LAMMPS-compatible user package along with setup codes for generating the required files for any full-length protein with folded and disordered regions.
具有长柔性连接子和全长无规卷曲蛋白质 (IDP) 的多结构域蛋白质最好被定义为构象的集合,而不是单一结构。使用实验结构生物物理学工具来确定此类蛋白质的高分辨率集合结构会带来各种挑战。现在,通常采用整合方法,结合可用的低分辨率集合平均实验数据和计算机生物分子重建。然而,对于大蛋白质,特别是对于那些折叠和无规卷曲结构域存在于同一多肽链中的蛋白质,进行广泛的玻尔兹曼加权构象采样仍然是一个挑战。在这项工作中,我们提出了一种 2 位点/氨基酸分辨率的 SOP-MULTI 力场,用于模拟多结构域蛋白质的粗粒模型。SOP-MULTI 结合了两种成熟的自组织聚合物模型:(i) 用于折叠系统的 SOP-SC 模型和 (ii) 用于 IDP 的 SOP-IDP。对于 SOP-MULTI,我们引入了属于折叠和无规卷曲区域的珠之间的交叉相互作用项,以生成全长多结构域蛋白质的构象集合,如 hnRNP A1、TDP-43、G3BP1、hGHR-ECD、TIA1、HIV-1 Gag、多泛素和 FUS。当反向映射到全原子分辨率时,SOP-MULTI 轨迹忠实地再现了整个倒易空间范围内的散射数据。我们还表明,单个折叠结构域相对于已解决的折叠结构保留了天然的接触,并且折叠结构域中残基的均方根波动与从相同折叠系统的全原子分子动力学模拟轨迹中获得的波动相匹配。SOP-MULTI 力场作为一个 LAMMPS 兼容的用户包提供,并带有生成具有折叠和无规卷曲结构域的任何全长蛋白质所需文件的设置代码。