Slabinski Lukasz, Jaroszewski Lukasz, Rodrigues Ana P C, Rychlewski Leszek, Wilson Ian A, Lesley Scott A, Godzik Adam
Joint Center for Structural Genomics, Bioinformatics Core, Burnham Institute for Medical Research, La Jolla, CA 92037, USA.
Protein Sci. 2007 Nov;16(11):2472-82. doi: 10.1110/ps.073037907.
The process of experimental determination of protein structure is marred with a high ratio of failures at many stages. With availability of large quantities of data from high-throughput structure determination in structural genomics centers, we can now learn to recognize protein features correlated with failures; thus, we can recognize proteins more likely to succeed and eventually learn how to modify those that are less likely to succeed. Here, we identify several protein features that correlate strongly with successful protein production and crystallization and combine them into a single score that assesses "crystallization feasibility." The formula derived here was tested with a jackknife procedure and validated on independent benchmark sets. The "crystallization feasibility" score described here is being applied to target selection in the Joint Center for Structural Genomics, and is now contributing to increasing the success rate, lowering the costs, and shortening the time for protein structure determination. Analyses of PDB depositions suggest that very similar features also play a role in non-high-throughput structure determination, suggesting that this crystallization feasibility score would also be of significant interest to structural biology, as well as to molecular and biochemistry laboratories.
蛋白质结构的实验测定过程在许多阶段都存在着很高的失败率。随着结构基因组学中心高通量结构测定产生的大量数据的可得性,我们现在能够学会识别与失败相关的蛋白质特征;因此,我们能够识别出更有可能成功的蛋白质,并最终学会如何改造那些不太可能成功的蛋白质。在这里,我们识别出了几种与蛋白质成功表达和结晶密切相关的蛋白质特征,并将它们组合成一个单一的分数,用于评估“结晶可行性”。这里推导的公式通过留一法进行了测试,并在独立的基准数据集上得到了验证。这里描述的“结晶可行性”分数正在应用于结构基因组学联合中心的靶点选择,并且目前正在为提高成功率、降低成本以及缩短蛋白质结构测定时间做出贡献。对蛋白质数据银行(PDB)存档数据的分析表明,非常相似的特征在非高通量结构测定中也发挥着作用,这表明这种结晶可行性分数对于结构生物学以及分子和生物化学实验室也将具有重大意义。