Tian Feifei, Zhang Chun, Fan Xia, Yang Xue, Wang Xi, Liang Huaping
State Key Laboratory of Trauma, Burns and Combined Injury, Research Institute of Surgery, Daping Hospital, The Third Military Medical University, Chongqing 400042, China phone: +86 23 68757411, fax: +86 23 68757404.
College of Bioengineering, Chongqing University, Chongqing 400044, China.
Mol Inform. 2010 Oct 11;29(10):707-15. doi: 10.1002/minf.201000092. Epub 2010 Oct 19.
Flexibility in biomolecules is an important determinant of biological functionality, which can be measured quantitatively by atomic Debye-Waller factor or B-factor. Although numerous works have been addressed on theoretical and computational studies of the B-factor profiles of proteins, the methods used for predicting B-factor values of nucleic acids, especially the complicated ribosomal RNAs (rRNAs), which are very functionally similar to proteins in providing matrix structures and in catalyzing biochemical reactions, still remain unexploited. In this article, we present a quantitative structure-flexibility relationship (QSFR) study with the aim at the quantitative prediction of rRNA B-factor based on primary sequences (sequence-based) and advanced structures (structure-based) by using both linear and nonlinear machine learning approaches, including partial least squares regression (PLS), least squares support vector machine (LSSVM), and Gaussian process (GP). By rigorously examining the performance and reliability of constructed statistical models and by comparing our models in detail to those developed previously for protein B-factors, we demonstrate that (i) rRNA B-factors could be predicted at a similar level of accuracy with that of protein, (ii) a structure-based approach performed much better as compared to sequence-based methods in modeling of rRNA B-factors, and (iii) rRNA flexibility is primarily governed by the local features of nonbonding potential landscapes, such as electrostatic and van der Waals forces.
生物分子的灵活性是生物功能的重要决定因素,可通过原子德拜-瓦勒因子或B因子进行定量测量。尽管已经有大量关于蛋白质B因子分布的理论和计算研究,但用于预测核酸(尤其是复杂的核糖体RNA,即rRNA)B因子值的方法仍未得到充分利用。rRNA在提供基质结构和催化生化反应方面与蛋白质的功能非常相似。在本文中,我们开展了一项定量结构-灵活性关系(QSFR)研究,旨在通过使用线性和非线性机器学习方法,包括偏最小二乘回归(PLS)、最小二乘支持向量机(LSSVM)和高斯过程(GP),基于一级序列(基于序列)和高级结构(基于结构)对rRNA的B因子进行定量预测。通过严格检验构建的统计模型的性能和可靠性,并将我们的模型与先前针对蛋白质B因子开发的模型进行详细比较,我们证明:(i)rRNA的B因子能够以与蛋白质相似的准确度进行预测;(ii)在rRNA B因子建模中,基于结构的方法比基于序列的方法表现得更好;(iii)rRNA的灵活性主要由非键合势能面的局部特征决定,如静电力和范德华力。