Yan Shaomin, Wu Guang
State Key Laboratory of Non-food Biomass Enzyme Technology, National Engineering Research Center for Non-food Biorefinery, Guangxi Key Laboratory of Biorefinery, Guangxi Academy of Sciences, 98 Daling Road, Nanning, Guangxi 530007 China.
Biol Proced Online. 2015 Nov 23;17:16. doi: 10.1186/s12575-015-0029-3. eCollection 2015.
Many studies have correlated characteristics of amino acids with crystallization propensity, as part of the effort to determine the factors that affect the propensity of protein crystallization. However, these characteristics are constant; that is, the encoded amino acid sequences have the same value for each type of amino acid. To overcome this inflexibility, three dynamic characteristics of amino acids and protein were introduced to analyze the crystallization propensity of proteins. Both logistic regression and neural network models were used to correlate each of two dynamic characteristics with the crystallization propensity of 301 proteins from Arabidopsis thaliana, and their results were compared with those obtained from each of 531 constant amino acid characteristics, which served as the benchmark.
The neural network model was more powerful for predicting the crystallization propensity of proteins than the logistic regression model. Compared with the benchmark, the dynamic characteristics of amino acids provided good prediction results for the crystallization propensity, and the distribution probability gave the highest sensitivity. Using 90 % accuracy as a cutoff point, the predictable portion of A. thaliana portions was ranked, and the statistical analysis showed that the larger the predictable portion, the better the prediction.
These results demonstrate that dynamic characteristics have a certain relationship with the crystallization propensity, and they could be helpful for the prediction of protein crystallization, which may provide a theoretical concept for certain proteins before conducting experimental crystallization.
作为确定影响蛋白质结晶倾向因素工作的一部分,许多研究已将氨基酸的特性与结晶倾向相关联。然而,这些特性是固定不变的;也就是说,编码的氨基酸序列对于每种类型的氨基酸都具有相同的值。为了克服这种僵化性,引入了氨基酸和蛋白质的三个动态特性来分析蛋白质的结晶倾向。使用逻辑回归和神经网络模型将两个动态特性中的每一个与来自拟南芥的301种蛋白质的结晶倾向相关联,并将它们的结果与作为基准的531种固定氨基酸特性中的每一个所获得的结果进行比较。
神经网络模型在预测蛋白质结晶倾向方面比逻辑回归模型更强大。与基准相比,氨基酸的动态特性为结晶倾向提供了良好的预测结果,并且分布概率给出了最高的灵敏度。以90%的准确率作为截止点,对拟南芥部分的可预测部分进行了排名,统计分析表明可预测部分越大,预测效果越好。
这些结果表明动态特性与结晶倾向有一定关系,它们可能有助于蛋白质结晶的预测,这可能为在进行实验结晶之前对某些蛋白质提供一个理论概念。