Department of Computer Science, University of York, Heslington, York YO10 5GH, United Kingdom.
Department of Chemistry, University of York, Heslington, York YO10 5DD, United Kingdom.
Acta Crystallogr D Struct Biol. 2021 Dec 1;77(Pt 12):1591-1601. doi: 10.1107/S2059798321010500. Epub 2021 Nov 29.
Proteins are macromolecules that perform essential biological functions which depend on their three-dimensional structure. Determining this structure involves complex laboratory and computational work. For the computational work, multiple software pipelines have been developed to build models of the protein structure from crystallographic data. Each of these pipelines performs differently depending on the characteristics of the electron-density map received as input. Identifying the best pipeline to use for a protein structure is difficult, as the pipeline performance differs significantly from one protein structure to another. As such, researchers often select pipelines that do not produce the best possible protein models from the available data. Here, a software tool is introduced which predicts key quality measures of the protein structures that a range of pipelines would generate if supplied with a given crystallographic data set. These measures are crystallographic quality-of-fit indicators based on included and withheld observations, and structure completeness. Extensive experiments carried out using over 2500 data sets show that the tool yields accurate predictions for both experimental phasing data sets (at resolutions between 1.2 and 4.0 Å) and molecular-replacement data sets (at resolutions between 1.0 and 3.5 Å). The tool can therefore provide a recommendation to the user concerning the pipelines that should be run in order to proceed most efficiently to a depositable model.
蛋白质是执行基本生物功能的大分子,这些功能取决于其三维结构。确定这种结构涉及复杂的实验室和计算工作。对于计算工作,已经开发了多个软件管道,以便根据晶体学数据构建蛋白质结构模型。这些管道中的每一个在接收输入的电子密度图的特征不同时,表现也不同。由于不同蛋白质结构之间的管道性能差异很大,因此确定要使用哪种最佳管道来构建蛋白质结构是很困难的。因此,研究人员通常选择从可用数据中生成最佳蛋白质模型的管道。这里介绍了一种软件工具,该工具预测了一系列管道在提供给定晶体数据集时生成的蛋白质结构的关键质量度量。这些度量是基于包含和排除观察的晶体质量拟合指标,以及结构完整性。使用超过 2500 个数据集进行的广泛实验表明,该工具可准确预测实验相位数据集(分辨率在 1.2 到 4.0 Å 之间)和分子置换数据集(分辨率在 1.0 到 3.5 Å 之间)。因此,该工具可以为用户提供有关应运行哪些管道的建议,以便最有效地进行可存储模型。