Department of Chemistry and Biochemistry , Ohio State University , Columbus , Ohio 43210 , United States.
J Chem Theory Comput. 2019 May 14;15(5):3410-3424. doi: 10.1021/acs.jctc.9b00101. Epub 2019 Apr 4.
Covalent labeling mass spectrometry experiments are growing in popularity and provide important information regarding protein structure. Information obtained from these experiments correlates with residue solvent exposure within the protein in solution. However, it is impossible to determine protein structure from covalent labeling data alone. Incorporation of sparse covalent labeling data into the protein structure prediction software Rosetta has been shown to improve protein tertiary structure prediction. Here, covalent labeling techniques were analyzed computationally to provide insight into what labeling data is needed to optimize tertiary protein structure prediction in Rosetta. We have successfully implemented a new scoring functionality that provides improved predictions. We developed two new covalent labeling based score terms that use a "cone"-based neighbor count to quantify the relative solvent exposure of each amino acid. To test our method, we used a set of 20 proteins with structures deposited in the Protein Data Bank. Decoy model sets were generated for each of these 20 proteins, and the normalized covalent labeling score versus RMSD distributions were evaluated. On the basis of these distributions, we have determined an optimal subset of residues to use when performing covalent labeling experiments in order to maximize the structure prediction capabilities of the covalent labeling data. We also investigated how much false negative and false positive data can be tolerated without meaningfully impacting protein structure prediction. Using these new covalent labeling score terms, protein models were rescored and the resulting models improved by 3.9 Å RMSD on average. New models were also generated using Rosetta's AbinitioRelax program under the guidance of covalent labeling information, and improvement in model quality was observed.
共价标记质谱实验越来越受欢迎,为蛋白质结构提供了重要信息。这些实验获得的信息与溶液中蛋白质残基溶剂暴露有关。然而,仅从共价标记数据无法确定蛋白质结构。将稀疏的共价标记数据纳入蛋白质结构预测软件 Rosetta 中,已被证明可以改善蛋白质三级结构预测。在这里,通过计算分析共价标记技术,深入了解需要哪些标记数据来优化 Rosetta 中的蛋白质三级结构预测。我们已经成功实现了一种新的评分功能,提供了改进的预测。我们开发了两种新的基于共价标记的评分项,它们使用基于“锥体”的邻居计数来量化每个氨基酸的相对溶剂暴露。为了测试我们的方法,我们使用了一组 20 个具有已在蛋白质数据库中存储结构的蛋白质。为这 20 个蛋白质中的每一个生成了诱饵模型集,并评估了归一化共价标记得分与 RMSD 的分布。基于这些分布,我们确定了在进行共价标记实验时使用的最佳残基子集,以最大限度地提高共价标记数据的结构预测能力。我们还研究了在不显著影响蛋白质结构预测的情况下,可以容忍多少假阴性和假阳性数据。使用这些新的共价标记评分项,对蛋白质模型进行了重新评分,平均 RMSD 提高了 3.9Å。还使用 Rosetta 的 AbinitioRelax 程序在共价标记信息的指导下生成新模型,并观察到模型质量的提高。