Unit of Computer-Aided Molecular 'Biosilico' Discovery and Bioinformatics Research (CAMD-BIR Unit), Faculty of Chemistry-Pharmacy, Central University of Las Villas, Santa Clara, Villa Clara, Cuba.
FEBS J. 2010 Aug;277(15):3118-46. doi: 10.1111/j.1742-4658.2010.07711.x. Epub 2010 Jun 25.
Descriptors calculated from a specific representation scheme encode only one part of the chemical information. For this reason, there is a need to construct novel graphical representations of proteins and novel protein descriptors that can provide new information about the structure of proteins. Here, a new set of protein descriptors based on computation of bilinear maps is presented. This novel approach to biomacromolecular design is relevant for QSPR studies on proteins. Protein bilinear indices are calculated from the kth power of nonstochastic and stochastic graph-theoretic electronic-contact matrices, M(m)(k) and (s)M(m)(k), respectively. That is to say, the kth nonstochastic and stochastic protein bilinear indices are calculated using M(m)(k) and (s)M(m)(k) as matrix operators of bilinear transformations. Moreover, biochemical information is codified by using different pair combinations of amino acid properties as weightings. Classification models based on a protein bilinear descriptor that discriminate between Arc mutants of stability similar or inferior to the wild-type form were developed. These equations permitted the correct classification of more than 90% of the mutants in training and test sets, respectively. To predict t(m) and Delta DeltaG(f)(o) values for Arc mutants, multiple linear regression and piecewise linear regression models were developed. The multiple linear regression models obtained accounted for 83% of the variance of the experimental t(m). Statistics calculated from internal and external validation procedures demonstrated robustness, stability and suitable power ability for all models. The results achieved demonstrate the ability of protein bilinear indices to encode biochemical information related to those structural changes significantly influencing the Arc repressor stability when punctual mutations are induced.
从特定表示方案计算出的描述符仅编码化学信息的一部分。因此,需要构建蛋白质的新图形表示形式和新的蛋白质描述符,这些描述符可以提供有关蛋白质结构的新信息。在这里,提出了一组基于双线性映射计算的新蛋白质描述符。这种新的生物大分子设计方法与蛋白质的 QSPR 研究相关。蛋白质双线性指数是根据非随机和随机图论电子接触矩阵 M(m)(k)和(s)M(m)(k)的 k 次幂计算得出的。也就是说,k 次非随机和随机蛋白质双线性指数是使用 M(m)(k)和(s)M(m)(k)作为双线性变换的矩阵算子计算得出的。此外,使用不同的氨基酸性质对组合作为加权来编码生化信息。基于能够区分稳定性与野生型相似或低于野生型的 Arc 突变体的蛋白质双线性描述符开发了分类模型。这些方程分别允许对训练集和测试集中超过 90%的突变体进行正确分类。为了预测 Arc 突变体的 t(m)和ΔΔG(f)(o)值,开发了多元线性回归和分段线性回归模型。所获得的多元线性回归模型解释了实验 t(m)方差的 83%。内部和外部验证程序计算的统计数据证明了所有模型的稳健性、稳定性和适当的能力。所取得的结果表明,蛋白质双线性指数能够编码与那些显著影响 Arc 抑制剂稳定性的结构变化相关的生化信息,当引入定点突变时。