Suppr超能文献

基于结构的精确高效计算突变,用于模拟维多利亚多管发光水母绿色荧光蛋白突变体的荧光水平。

Accurate and efficient structure-based computational mutagenesis for modeling fluorescence levels of Aequorea victoria green fluorescent protein mutants.

机构信息

Laboratory for Structural Bioinformatics, School of Systems Biology, George Mason University, 10900 University Boulevard MS 5B3, Manassas, VA 20110, USA.

出版信息

Protein Eng Des Sel. 2020 Sep 14;33. doi: 10.1093/protein/gzaa022.

Abstract

A computational mutagenesis technique was used to characterize the structural effects associated with over 46 000 single and multiple amino acid variants of Aequorea victoria green fluorescent protein (GFP), whose functional effects (fluorescence levels) were recently measured by experimental researchers. For each GFP mutant, the approach generated a single score reflecting the overall change in sequence-structure compatibility relative to native GFP, as well as a vector of environmental perturbation (EP) scores characterizing the impact at all GFP residue positions. A significant GFP structure-function relationship (P < 0.0001) was elucidated by comparing the sequence-structure compatibility scores with the functional data. Next, the computed vectors for GFP mutants were used to train predictive models of fluorescence by implementing random forest (RF) classification and tree regression machine learning algorithms. Classification performance reached 0.93 for sensitivity, 0.91 for precision and 0.90 for balanced accuracy, and regression models led to Pearson's correlation as high as r = 0.83 between experimental and predicted GFP mutant fluorescence. An RF model trained on a subset of over 1000 experimental single residue GFP mutants with measured fluorescence was used for predicting the 3300 remaining unstudied single residue mutants, with results complementing known GFP biochemical and biophysical properties. In addition, models trained on the subset of experimental GFP mutants harboring multiple residue replacements successfully predicted fluorescence of the single residue GFP mutants. The models developed for this study were accurate and efficient, and their predictions outperformed those of several related state-of-the-art methods.

摘要

使用计算突变技术来描述与 Aequorea victoria 绿色荧光蛋白 (GFP) 的 46000 多种单氨基酸和多氨基酸变体相关的结构效应,其功能效应 (荧光水平) 最近被实验研究人员测量。对于每个 GFP 突变体,该方法生成一个单一的分数,反映了相对于天然 GFP 的序列-结构兼容性的总体变化,以及一个描述所有 GFP 残基位置影响的环境扰动 (EP) 分数向量。通过比较序列-结构兼容性分数与功能数据,阐明了 GFP 结构-功能关系的显著相关性 (P<0.0001)。接下来,使用 GFP 突变体的计算向量通过实现随机森林 (RF) 分类和树回归机器学习算法来训练荧光预测模型。分类性能达到了 0.93 的灵敏度、0.91 的精确度和 0.90 的平衡准确性,回归模型导致实验和预测 GFP 突变体荧光之间的 Pearson 相关系数高达 r=0.83。使用经过实验测量的荧光的 1000 多个单残基 GFP 突变体子集训练的 RF 模型用于预测其余 3300 个未研究的单残基突变体,结果补充了已知的 GFP 生化和生物物理特性。此外,在含有多个残基替换的实验 GFP 突变体子集上训练的模型成功预测了单残基 GFP 突变体的荧光。为这项研究开发的模型准确且高效,其预测性能优于几种相关的最先进方法。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验