Department of Chemistry, University of York, York YO10 5DD, United Kingdom.
Acta Crystallogr D Struct Biol. 2020 Aug 1;76(Pt 8):713-723. doi: 10.1107/S2059798320009080. Epub 2020 Jul 27.
Manually identifying and correcting errors in protein models can be a slow process, but improvements in validation tools and automated model-building software can contribute to reducing this burden. This article presents a new correctness score that is produced by combining multiple sources of information using a neural network. The residues in 639 automatically built models were marked as correct or incorrect by comparing them with the coordinates deposited in the PDB. A number of features were also calculated for each residue using Coot, including map-to-model correlation, density values, B factors, clashes, Ramachandran scores, rotamer scores and resolution. Two neural networks were created using these features as inputs: one to predict the correctness of main-chain atoms and the other for side chains. The 639 structures were split into 511 that were used to train the neural networks and 128 that were used to test performance. The predicted correctness scores could correctly categorize 92.3% of the main-chain atoms and 87.6% of the side chains. A Coot ML Correctness script was written to display the scores in a graphical user interface as well as for the automatic pruning of chains, residues and side chains with low scores. The automatic pruning function was added to the CCP4i2 Buccaneer automated model-building pipeline, leading to significant improvements, especially for high-resolution structures.
手动识别和纠正蛋白质模型中的错误可能是一个缓慢的过程,但验证工具和自动化建模软件的改进有助于减轻这一负担。本文提出了一种新的正确性评分方法,该方法通过使用神经网络结合多个来源的信息来生成。通过将自动构建的 639 个模型的坐标与 PDB 中存储的坐标进行比较,将这些模型中的残基标记为正确或错误。还使用 Coot 为每个残基计算了许多特征,包括图谱与模型的相关性、密度值、B 因子、冲突、Ramachandran 分数、构象分数和分辨率。使用这些特征作为输入创建了两个神经网络:一个用于预测主链原子的正确性,另一个用于预测侧链原子的正确性。将 639 个结构分为 511 个用于训练神经网络和 128 个用于测试性能。预测的正确性得分可以正确地对 92.3%的主链原子和 87.6%的侧链原子进行分类。编写了一个 Coot ML 正确性脚本,以便在图形用户界面中显示分数,并用于自动修剪低得分的链、残基和侧链。该自动修剪功能已被添加到 CCP4i2 Buccaneer 自动化建模管道中,从而显著提高了模型质量,尤其是对于高分辨率结构。