Department of Microbiology, Federal University of Viçosa, Viçosa, Minas Gerais, 36570900, Brazil.
Bioinformatics, Institute of Biochemistry and Biology, University of Potsdam, Potsdam, 14476, Germany; Systems Biology and Mathematical Modelling, Max Planck Institute of Molecular Plant Physiology, Potsdam, 14476, Germany.
Metab Eng. 2023 Nov;80:184-192. doi: 10.1016/j.ymben.2023.09.014. Epub 2023 Oct 5.
Quantification of how different environmental cues affect protein allocation can provide important insights for understanding cell physiology. While absolute quantification of proteins can be obtained by resource-intensive mass-spectrometry-based technologies, prediction of protein abundances offers another way to obtain insights into protein allocation. Here we present CAMEL, a framework that couples constraint-based modelling with machine learning to predict protein abundance for any environmental condition. This is achieved by building machine learning models that leverage static features, derived from protein sequences, and condition-dependent features predicted from protein-constrained metabolic models. Our findings demonstrate that CAMEL results in excellent prediction of protein allocation in E. coli (average Pearson correlation of at least 0.9), and moderate performance in S. cerevisiae (average Pearson correlation of at least 0.5). Therefore, CAMEL outperformed contending approaches without using molecular read-outs from unseen conditions and provides a valuable tool for using protein allocation in biotechnological applications.
量化不同环境线索如何影响蛋白质分配,可以为理解细胞生理学提供重要的见解。虽然基于质谱的资源密集型技术可以实现蛋白质的绝对定量,但预测蛋白质丰度为深入了解蛋白质分配提供了另一种方法。在这里,我们提出了 CAMEL,这是一种将约束建模与机器学习相结合的框架,可预测任何环境条件下的蛋白质丰度。这是通过构建机器学习模型来实现的,这些模型利用源自蛋白质序列的静态特征和从蛋白质约束代谢模型预测的条件相关特征。我们的研究结果表明,CAMEL 可以出色地预测大肠杆菌中的蛋白质分配(至少 0.9 的平均 Pearson 相关系数),在酿酒酵母中表现中等(至少 0.5 的平均 Pearson 相关系数)。因此,CAMEL 在不使用未见条件的分子读出值的情况下优于竞争方法,并为生物技术应用中的蛋白质分配提供了有价值的工具。