Department of Chemistry and Quantum Theory Project, University of Florida, Gainesville, Florida 32611-7011, United States.
Department of Chemistry, University of Calgary, Calgary, Alberta T2N 1N4, Canada.
J Chem Theory Comput. 2024 Oct 22;20(20):9230-9242. doi: 10.1021/acs.jctc.4c00690. Epub 2024 Oct 2.
Integrative structural biology synergizes experimental data with computational methods to elucidate the structures and interactions within biomolecules, a task that becomes critical in the absence of high-resolution structural data. A challenging step for integrating the data is knowing the expected accuracy or belief in the dataset. We previously showed that the Modeling Employing Limited Data (MELD) approach succeeds at predicting structures and finding the best interpretation of the data when the initial belief is equal to or slightly lower than the real value. However, the initial belief might be unknown to the user, as it depends on both the technique and the system of study. Here we introduce MELD-Adapt, designed to dynamically evaluate and infer the reliability of input data while at the same time finding the best interpretation of the data and the structures compatible with it. We demonstrate the utility of this method across different systems, particularly emphasizing its capability to correct initial assumptions and identify the correct fraction of data to produce reliable structural models. The approach is tested with two benchmark sets: the folding of 12 proteins with coarse physical insights and the binding of peptides with varying affinities to the extraterminal domain using chemical shift perturbation data. We find that subtle differences in data structure (e.g., locally clustered or globally distributed), starting belief, and force field preferences can have an impact on the predictions, limiting the possibility of a transferable protocol across all systems and data types. Nonetheless, we find a wide range of initial setup conditions that will lead to successful sampling and identification of native states, leading to a robust pipeline. Furthermore, disagreements about how much data is enforced and satisfied rapidly serve to identify incorrect setup conditions.
整合结构生物学将实验数据与计算方法相结合,以阐明生物分子内的结构和相互作用,在缺乏高分辨率结构数据的情况下,这项任务变得至关重要。整合数据的一个具有挑战性的步骤是了解数据集的预期准确性或置信度。我们之前曾表明,当初始置信度等于或略低于真实值时,采用有限数据建模(MELD)方法可以成功预测结构并找到数据的最佳解释。然而,用户可能不知道初始置信度,因为它取决于技术和研究系统。在这里,我们引入了 MELD-Adapt,旨在动态评估和推断输入数据的可靠性,同时找到数据的最佳解释以及与其兼容的结构。我们在不同的系统中展示了这种方法的实用性,特别是强调了它纠正初始假设和识别产生可靠结构模型的正确数据部分的能力。该方法通过两个基准集进行了测试:具有粗粒物理洞察力的 12 种蛋白质折叠和使用化学位移扰动数据与末端外域结合的肽的结合,具有不同亲和力。我们发现数据结构(例如局部聚类或全局分布)、起始置信度和力场偏好的细微差异会对预测产生影响,限制了跨所有系统和数据类型的可转移协议的可能性。尽管如此,我们发现了广泛的初始设置条件,这些条件将导致成功的采样和天然状态的识别,从而形成一个稳健的流程。此外,关于需要强制执行和满足多少数据的分歧很快就会识别出不正确的设置条件。