Ferreira Mauricio, Ventorim Rafaela, Almeida Eduardo, Silveira Sabrina, Silveira Wendel
Department of Microbiology, Universidade Federal de Viçosa, Viçosa, MG 36570-900, Brazil. Electronic address: https://twitter.com/@mauriciomyces.
Department of Microbiology, Universidade Federal de Viçosa, Viçosa, MG 36570-900, Brazil.
J Mol Biol. 2021 Nov 5;433(22):167267. doi: 10.1016/j.jmb.2021.167267. Epub 2021 Sep 23.
Proteins are responsible for most physiological processes, and their abundance provides crucial information for systems biology research. However, absolute protein quantification, as determined by mass spectrometry, still has limitations in capturing the protein pool. Protein abundance is impacted by translation kinetics, which rely on features of codons. In this study, we evaluated the effect of codon usage bias of genes on protein abundance. Notably, we observed differences regarding codon usage patterns between genes coding for highly abundant proteins and genes coding for less abundant proteins. Analysis of synonymous codon usage and evolutionary selection showed a clear split between the two groups. Our machine learning models predicted protein abundances from codon usage metrics with remarkable accuracy, achieving strong correlation with experimental data. Upon integration of the predicted protein abundance in enzyme-constrained genome-scale metabolic models, the simulated phenotypes closely matched experimental data, which demonstrates that our predictive models are valuable tools for systems metabolic engineering approaches.
蛋白质负责大多数生理过程,其丰度为系统生物学研究提供了关键信息。然而,通过质谱法确定的绝对蛋白质定量在捕捉蛋白质库方面仍存在局限性。蛋白质丰度受翻译动力学影响,而翻译动力学依赖于密码子的特征。在本研究中,我们评估了基因密码子使用偏好对蛋白质丰度的影响。值得注意的是,我们观察到编码高丰度蛋白质的基因和编码低丰度蛋白质的基因在密码子使用模式上存在差异。同义密码子使用和进化选择分析表明两组之间存在明显的分化。我们的机器学习模型根据密码子使用指标预测蛋白质丰度,具有显著的准确性,与实验数据具有很强的相关性。将预测的蛋白质丰度整合到酶约束的基因组规模代谢模型中后,模拟表型与实验数据紧密匹配,这表明我们的预测模型是系统代谢工程方法的有价值工具。