Ding Zundan, Guan Feifei, Xu Guoshun, Wang Yuchen, Yan Yaru, Zhang Wei, Wu Ningfeng, Yao Bin, Huang Huoqing, Tuller Tamir, Tian Jian
Biotechnology Research Institute, Chinese Academy of Agricultural Sciences, Beijing 100081, China.
Institute of Animal Science, Chinese Academy of Agricultural Sciences, Beijing 100193, China.
Comput Struct Biotechnol J. 2022 Mar 1;20:1142-1153. doi: 10.1016/j.csbj.2022.02.030. eCollection 2022.
The expression of proteins in is often essential for their characterization, modification, and subsequent application. Gene sequence is the major factor contributing expression. In this study, we used the expression data from 6438 heterologous proteins under the same expression condition in to construct a deep learning classifier for screening high- and low-expression proteins. In conjunction with conserved residue analysis to minimize functional disruption, a mutation predictor for enhanced protein expression (MPEPE) was proposed to identify mutations conducive to protein expression. MPEPE identified mutation sites in laccase 13B22 and the glucose dehydrogenase FAD-AtGDH, that significantly increased both expression levels and activity of these proteins. Additionally, a significant correlation of 0.46 between the predicted high level expression propensity with the constructed models and the protein abundance of endogenous genes in was also been detected. Therefore, the study provides foundational insights into the relationship between specific amino acid usage, codon usage, and protein expression, and is essential for research and industrial applications.
蛋白质在 中的表达对于其表征、修饰及后续应用往往至关重要。基因序列是影响表达的主要因素。在本研究中,我们利用 在相同表达条件下6438种异源蛋白的表达数据构建了一个深度学习分类器,用于筛选高表达和低表达蛋白。结合保守残基分析以尽量减少功能破坏,提出了一种增强蛋白表达的突变预测器(MPEPE)来识别有利于蛋白表达的突变。MPEPE在漆酶13B22和葡萄糖脱氢酶FAD - AtGDH中鉴定出突变位点,这些位点显著提高了这些蛋白的表达水平和活性。此外,还检测到预测的高水平表达倾向与构建模型与 中内源基因的蛋白丰度之间存在0.46的显著相关性。因此,该研究为特定氨基酸使用、密码子使用和蛋白表达之间的关系提供了基础见解,对研究和工业应用至关重要。