Li Minhuan, Dalton Kevin, Hekstra Doeke
John A. Paulson School of Engineering & Applied Sciences, Harvard University.
Department of Molecular & Cellular Biology, Harvard University.
bioRxiv. 2025 Jan 19:2025.01.12.632630. doi: 10.1101/2025.01.12.632630.
Proteins drive biochemical transformations by transitioning through distinct conformational states. Understanding these states is essential for modulating protein function. Although X-ray crystallography has enabled revolutionary advances in protein structure prediction by machine learning, this connection was made at the level of atomic models, not the underlying data. This lack of connection to crystallographic data limits the potential for further advances in both the accuracy of protein structure prediction and the application of machine learning to experimental structure determination. Here, we present SFCalculator, a differentiable pipeline that generates crystallographic observables from atomistic molecular structures with bulk solvent correction, bridging crystallographic data and neural network-based molecular modeling. We validate SFCalculator against conventional methods and demonstrate its utility by establishing three important proof-of-concept applications. First, SFCalculator enables accurate placement of molecular models relative to crystal lattices (known as phasing). Second, SFCalculator enables the search of the latent space of generative models for conformations that fit crystallographic data and are, therefore, also implicitly constrained by the information encoded by the model. Finally, SFCalculator enables the use of crystallographic data during training of generative models, enabling these models to generate an ensemble of conformations consistent with crystallographic data. SFCalculator, therefore, enables a new generation of analytical paradigms integrating crystallographic data and machine learning.
蛋白质通过转变为不同的构象状态来驱动生化转化。理解这些状态对于调节蛋白质功能至关重要。尽管X射线晶体学通过机器学习在蛋白质结构预测方面取得了革命性进展,但这种联系是在原子模型层面建立的,而非基础数据层面。这种与晶体学数据缺乏联系的情况限制了蛋白质结构预测准确性以及机器学习在实验结构测定中应用的进一步发展潜力。在此,我们展示了SFCalculator,这是一种可微管道,可通过具有体溶剂校正的原子分子结构生成晶体学可观测量,架起了晶体学数据与基于神经网络的分子建模之间的桥梁。我们将SFCalculator与传统方法进行了验证,并通过建立三个重要的概念验证应用展示了其效用。首先,SFCalculator能够相对于晶格准确放置分子模型(称为相位确定)。其次,SFCalculator能够在生成模型的潜在空间中搜索适合晶体学数据的构象,因此也受到模型编码信息的隐含约束。最后,SFCalculator能够在生成模型训练期间使用晶体学数据,使这些模型能够生成与晶体学数据一致的构象集合。因此,SFCalculator实现了整合晶体学数据和机器学习的新一代分析范式。