Gao Jianzhao, Wu Zhonghua, Hu Gang, Wang Kui, Song Jiangning, Joachimiak Andrzej, Kurgan Lukasz
School of Mathematical Sciences and LPMC, Nankai University, Tianjin, China.
Infection and Immunity Program, Biomedicine Discovery Institute and Department of Biochemistry and Molecular Biology, Monash University, Melbourne, Australia.
Curr Protein Pept Sci. 2018;19(2):200-210. doi: 10.2174/1389203718666170921114437.
Selection of proper targets for the X-ray crystallography will benefit biological research community immensely. Several computational models were proposed to predict propensity of successful protein production and diffraction quality crystallization from protein sequences. We reviewed a comprehensive collection of 22 such predictors that were developed in the last decade. We found that almost all of these models are easily accessible as webservers and/or standalone software and we demonstrated that some of them are widely used by the research community. We empirically evaluated and compared the predictive performance of seven representative methods. The analysis suggests that these methods produce quite accurate propensities for the diffraction-quality crystallization. We also summarized results of the first study of the relation between these predictive propensities and the resolution of the crystallizable proteins. We found that the propensities predicted by several methods are significantly higher for proteins that have high resolution structures compared to those with the low resolution structures. Moreover, we tested a new meta-predictor, MetaXXC, which averages the propensities generated by the three most accurate predictors of the diffraction-quality crystallization. MetaXXC generates putative values of resolution that have modest levels of correlation with the experimental resolutions and it offers the lowest mean absolute error when compared to the seven considered methods. We conclude that protein sequences can be used to fairly accurately predict whether their corresponding protein structures can be solved using X-ray crystallography. Moreover, we also ascertain that sequences can be used to reasonably well predict the resolution of the resulting protein crystals.
为X射线晶体学选择合适的目标将极大地造福于生物学研究界。人们提出了几种计算模型,用于根据蛋白质序列预测蛋白质成功表达以及晶体衍射质量的倾向。我们回顾了过去十年中开发的22种此类预测器的综合集合。我们发现,几乎所有这些模型都可以通过网络服务器和/或独立软件轻松获取,并且我们证明其中一些模型被研究界广泛使用。我们通过实证评估并比较了七种代表性方法的预测性能。分析表明,这些方法能够相当准确地预测衍射质量晶体形成的倾向。我们还总结了关于这些预测倾向与可结晶蛋白质分辨率之间关系的首次研究结果。我们发现,与低分辨率结构的蛋白质相比,几种方法预测的具有高分辨率结构的蛋白质的倾向明显更高。此外,我们测试了一种新的元预测器MetaXXC,它对衍射质量晶体形成的三个最准确的预测器生成的倾向进行平均。MetaXXC生成的假定分辨率值与实验分辨率具有适度的相关性,并且与所考虑的七种方法相比,它提供的平均绝对误差最低。我们得出结论,蛋白质序列可用于相当准确地预测其相应的蛋白质结构是否可以通过X射线晶体学解析。此外,我们还确定序列可用于合理地预测所得蛋白质晶体的分辨率。