Suppr超能文献

分析大肠杆菌 K-12 MG1655 模型中 mRNA 和蛋白质的蛋白表达与序列相关特征之间的相关性。

Analyzing the correlation between protein expression and sequence-related features of mRNA and protein in Escherichia coli K-12 MG1655 model.

机构信息

Center for Bioscience and Biotechnology, University of Science, Ho Chi Minh City, Vietnam.

Vietnam National University, Ho Chi Minh City, Vietnam.

出版信息

PLoS One. 2024 Feb 7;19(2):e0288526. doi: 10.1371/journal.pone.0288526. eCollection 2024.

Abstract

It was necessary to have a tool that could predict the amount of protein and optimize the gene sequences to produce recombinant proteins efficiently. The Transim model published by Tuller et al. in 2018 can calculate the translation rate in E. coli using features on the mRNA sequence, achieving a Spearman correlation with the amount of protein per mRNA of 0.36 when tested on the dataset of operons' first genes in E. coli K-12 MG1655 genome. However, this Spearman correlation was not high, and the model did not fully consider the features of mRNA and protein sequences. Therefore, to enhance the prediction capability, our study firstly tried expanding the testing dataset, adding genes inside the operon, and using the microarray of the mRNA expression data set, thereby helping to improve the correlation of translation rate with the amount of protein with more than 0.42. Next, the applicability of 6 traditional machine learning models to calculate a "new translation rate" was examined using initiation rate and elongation rate as inputs. The result showed that the SVR algorithm had the most correlated new translation rates, with Spearman correlation improving to R = 0.6699 with protein level output and to R = 0.6536 with protein level per mRNA. Finally, the study investigated the degree of improvement when combining more features with the new translation rates. The results showed that the model's predictive ability to produce a protein per mRNA reached R = 0.6660 when using six features, while the correlation of this model's final translation rate to protein level was up to R = 0.6729. This demonstrated the model's capability to predict protein expression of a gene, rather than being limited to predicting expression by an mRNA and showed the model's potential for development into gene expression predicting tools.

摘要

需要有一种工具能够预测蛋白质的含量并优化基因序列,以有效地生产重组蛋白。Tuller 等人于 2018 年发布的 Transim 模型可以使用 mRNA 序列上的特征来计算大肠杆菌中的翻译速率,在对大肠杆菌 K-12 MG1655 基因组中操纵子第一个基因数据集进行测试时,与每个 mRNA 上的蛋白质含量的 Spearman 相关系数达到 0.36。然而,这种 Spearman 相关性并不高,而且该模型没有充分考虑 mRNA 和蛋白质序列的特征。因此,为了增强预测能力,我们的研究首先尝试扩展测试数据集,添加操纵子内的基因,并使用 mRNA 表达数据集的微阵列,从而帮助将翻译速率与蛋白质含量的相关性提高到 0.42 以上。接下来,使用起始速率和延伸速率作为输入,检查了 6 种传统机器学习模型计算“新翻译速率”的适用性。结果表明,SVR 算法具有最相关的新翻译速率,Spearman 相关系数提高到 R = 0.6699(与蛋白质水平输出相关)和 R = 0.6536(与每个 mRNA 的蛋白质水平相关)。最后,研究了将更多特征与新翻译速率结合时的改进程度。结果表明,当使用六个特征时,该模型对每个 mRNA 产生蛋白质的预测能力达到 R = 0.6660,而该模型的最终翻译速率与蛋白质水平的相关性高达 R = 0.6729。这表明该模型具有预测基因表达的能力,而不仅仅局限于预测 mRNA 的表达,并且该模型具有开发成基因表达预测工具的潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4d9e/10849221/b667dc75dbb8/pone.0288526.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验