School of Molecular Biosciences, Washington State University, Pullman, Washington, United States of America.
PLoS One. 2011;6(12):e28507. doi: 10.1371/journal.pone.0028507. Epub 2011 Dec 2.
Temperature-sensitive (TS) mutants are powerful tools to study gene function in vivo. These mutants exhibit wild-type activity at permissive temperatures and reduced activity at restrictive temperatures. Although random mutagenesis can be used to generate TS mutants, the procedure is laborious and unfeasible in multicellular organisms. Further, the underlying molecular mechanisms of the TS phenotype are poorly understood. To elucidate TS mechanisms, we used a machine learning method-logistic regression-to investigate a large number of sequence and structure features. We developed and tested 133 features, describing properties of either the mutation site or the mutation site neighborhood. We defined three types of neighborhood using sequence distance, Euclidean distance, and topological distance. We discovered that neighborhood features outperformed mutation site features in predicting TS mutations. The most predictive features suggest that TS mutations tend to occur at buried and rigid residues, and are located at conserved protein domains. The environment of a buried residue often determines the overall structural stability of a protein, thus may lead to reversible activity change upon temperature switch. We developed TS prediction models based on logistic regression and the Lasso regularized procedure. Through a ten-fold cross-validation, we obtained the area under the curve of 0.91 for the model using both sequence and structure features. Testing on independent datasets suggested that the model predicted TS mutations with a 50% precision. In summary, our study elucidated the molecular basis of TS mutants and suggested the importance of neighborhood properties in determining TS mutations. We further developed models to predict TS mutations derived from single amino acid substitutions. In this way, TS mutants can be efficiently obtained through experimentally introducing the predicted mutations.
温度敏感(TS)突变体是研究基因在体内功能的有力工具。这些突变体在许可温度下表现出野生型活性,而在限制温度下活性降低。虽然可以使用随机诱变来产生 TS 突变体,但该过程在多细胞生物中是费力且不可行的。此外,TS 表型的潜在分子机制理解甚少。为了阐明 TS 机制,我们使用机器学习方法-逻辑回归来研究大量的序列和结构特征。我们开发并测试了 133 个特征,描述了突变部位或突变部位周围的特性。我们使用序列距离、欧几里得距离和拓扑距离定义了三种类型的邻域。我们发现,邻域特征在预测 TS 突变方面优于突变部位特征。最具预测性的特征表明,TS 突变倾向于发生在埋藏和刚性残基上,并且位于保守的蛋白质结构域中。埋藏残基的环境通常决定了蛋白质的整体结构稳定性,因此在温度转换时可能导致可逆的活性变化。我们基于逻辑回归和 Lasso 正则化程序开发了 TS 预测模型。通过十折交叉验证,我们获得了使用序列和结构特征的模型的曲线下面积为 0.91。在独立数据集上的测试表明,该模型预测 TS 突变的精度为 50%。总之,我们的研究阐明了 TS 突变体的分子基础,并表明邻域特性在确定 TS 突变中的重要性。我们进一步开发了模型来预测源于单个氨基酸取代的 TS 突变。通过这种方式,可以通过实验引入预测的突变来有效地获得 TS 突变体。