Lu Hui-Meng, Yin Da-Chuan, Liu Yong-Ming, Guo Wei-Hong, Zhou Ren-Bin
Institute of Special Environmental Biophysics, School of Life Sciences, Northwestern Polytechnical University, Xi'an 710072, Shaanxi, China.
Key Laboratory for Space Bioscience and Biotechnology, Northwestern Polytechnical University, Xi'an 710072, Shaanxi, China.
Int J Mol Sci. 2012;13(8):9514-9526. doi: 10.3390/ijms13089514. Epub 2012 Jul 27.
The protein structural entries grew far slower than the sequence entries. This is partly due to the bottleneck in obtaining diffraction quality protein crystals for structural determination using X-ray crystallography. The first step to achieve protein crystallization is to find out suitable chemical reagents. However, it is not an easy task. Exhausting trial and error tests of numerous combinations of different reagents mixed with the protein solution are usually necessary to screen out the pursuing crystallization conditions. Therefore, any attempts to help find suitable reagents for protein crystallization are helpful. In this paper, an analysis of the relationship between the protein sequence similarity and the crystallization reagents according to the information from the existing databases is presented. We extracted information of reagents and sequences from the Biological Macromolecule Crystallization Database (BMCD) and the Protein Data Bank (PDB) database, classified the proteins into different clusters according to the sequence similarity, and statistically analyzed the relationship between the sequence similarity and the crystallization reagents. The results showed that there is a pronounced positive correlation between them. Therefore, according to the correlation, prediction of feasible chemical reagents that are suitable to be used in crystallization screens for a specific protein is possible.
蛋白质结构条目增长速度远低于序列条目。部分原因在于,利用X射线晶体学进行结构测定时,获取用于结构测定的衍射质量蛋白质晶体存在瓶颈。实现蛋白质结晶的第一步是找出合适的化学试剂。然而,这并非易事。通常需要对与蛋白质溶液混合的不同试剂的众多组合进行大量反复试验,以筛选出理想的结晶条件。因此,任何有助于寻找蛋白质结晶合适试剂的尝试都是有益的。本文根据现有数据库中的信息,对蛋白质序列相似性与结晶试剂之间的关系进行了分析。我们从生物大分子结晶数据库(BMCD)和蛋白质数据库(PDB)中提取了试剂和序列信息,根据序列相似性将蛋白质分为不同的簇,并对序列相似性与结晶试剂之间的关系进行了统计分析。结果表明,它们之间存在显著的正相关。因此,根据这种相关性,可以预测适合用于特定蛋白质结晶筛选的可行化学试剂。