Suppr超能文献

基于机器学习分析5'非翻译区对蛋白质表达的影响。

Machine learning-based analysis of the impact of 5' untranslated region on protein expression.

作者信息

Wang Linfeng, Liu Sujia, Huang Jia-Xin, Zhu Haifeng, Li Shuyu, Li Yannan, Chen Sen, Han Jianying, Zhu Yin, Wu Jiahao, Liao Wentao, Zhang Hongmei, Zeng Haiyan, Li Shaoting, Zhao Shuping, Wang Bingwei, Lin Jiaqi, Zeng Ji

机构信息

School of Biomedical and Pharmaceutical Sciences, Guangdong University of Technology, No. 100 Waihuanxi Road, Guangzhou 510006, China.

MOE Key Laboratory of Bio-Intelligent Manufacturing, School of Bioengineering, Dalian University of Technology, Dalian 116024, China.

出版信息

Nucleic Acids Res. 2025 Sep 5;53(17). doi: 10.1093/nar/gkaf861.

Abstract

The 5' untranslated region (5'UTR) plays a crucial regulatory role in messenger RNA (mRNA), with modified 5'UTRs extensively utilized in vaccine production, gene therapy, etc. Nevertheless, manually optimizing 5'UTRs may encounter difficulties in balancing the effects of various cis-elements. Consequently, multiple 5'UTR libraries have been created, and machine learning models have been employed to analyze and predict translation efficiency (TE) and protein expression, providing insights into critical regulatory features. On the one hand, these screening libraries, based on TE and mean ribosome load, struggle to accurately quantify protein expression; on the other hand, a precise method for quantifying 5'UTRs necessitates a significantly costlier library. To resolve this dilemma, we constructed a library utilizing firefly luciferase as the reporter to measure accurate protein expression. In addition, we optimized the library construction method by clustering mRNA sequences to reduce redundant data and minimize the size of the dataset. This dual strategy by increasing accuracy and reducing dataset size was found to be effective in predicting the 5'UTRs from the PC3 cell line.

摘要

5'非翻译区(5'UTR)在信使核糖核酸(mRNA)中起着至关重要的调节作用,修饰后的5'UTR广泛应用于疫苗生产、基因治疗等领域。然而,手动优化5'UTR在平衡各种顺式元件的影响时可能会遇到困难。因此,已经创建了多个5'UTR文库,并采用机器学习模型来分析和预测翻译效率(TE)和蛋白质表达,从而深入了解关键的调控特征。一方面,这些基于TE和平均核糖体负载的筛选文库难以准确量化蛋白质表达;另一方面,一种精确量化5'UTR的方法需要一个成本高得多的文库。为了解决这一困境,我们构建了一个以萤火虫荧光素酶作为报告基因的文库来测量准确的蛋白质表达。此外,我们通过对mRNA序列进行聚类来优化文库构建方法,以减少冗余数据并最小化数据集的大小。通过提高准确性和减小数据集大小的这种双重策略,发现对于预测来自PC3细胞系的5'UTR是有效的。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1782/12418383/7072fad6197d/gkaf861figgra1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验