在无需先验知识的情况下，利用交织于遗传密码冗余中的隐藏信息。

Exploiting hidden information interleaved in the redundancy of the genetic code without prior knowledge.

作者信息

Zur Hadas, Tuller Tamir

机构信息

Department of Biomedical Engineering, The Engineering Faculty, Blavatnik School of Computer Science, Faculty of Exact Sciences and Sagol School of Neuroscience, Tel Aviv University, Tel-Aviv 69978, Israel Department of Biomedical Engineering, The Engineering Faculty, Blavatnik School of Computer Science, Faculty of Exact Sciences and Sagol School of Neuroscience, Tel Aviv University, Tel-Aviv 69978, Israel.

出版信息

Bioinformatics. 2015 Apr 15;31(8):1161-8. doi: 10.1093/bioinformatics/btu797. Epub 2014 Nov 29.

DOI:10.1093/bioinformatics/btu797

PMID:25433697

Abstract

MOTIVATION

Dozens of studies in recent years have demonstrated that codon usage encodes various aspects related to all stages of gene expression regulation. When relevant high-quality large-scale gene expression data are available, it is possible to statistically infer and model these signals, enabling analysing and engineering gene expression. However, when these data are not available, it is impossible to infer and validate such models.

RESULTS

In this current study, we suggest Chimera-an unsupervised computationally efficient approach for exploiting hidden high-dimensional information related to the way gene expression is encoded in the open reading frame (ORF), based solely on the genome of the analysed organism. One version of the approach, named Chimera Average Repetitive Substring (ChimeraARS), estimates the adaptability of an ORF to the intracellular gene expression machinery of a genome (host), by computing its tendency to include long substrings that appear in its coding sequences; the second version, named ChimeraMap, engineers the codons of a protein such that it will include long substrings of codons that appear in the host coding sequences, improving its adaptation to a new host's gene expression machinery. We demonstrate the applicability of the new approach for analysing and engineering heterologous genes and for analysing endogenous genes. Specifically, focusing on Escherichia coli, we show that it can exploit information that cannot be detected by conventional approaches (e.g. the CAI-Codon Adaptation Index), which only consider single codon distributions; for example, we report correlations of up to 0.67 for the ChimeraARS measure with heterologous gene expression, when the CAI yielded no correlation.

AVAILABILITY AND IMPLEMENTATION

For non-commercial purposes, the code of the Chimera approach can be downloaded from http://www.cs.tau.ac.il/∼tamirtul/Chimera/download.htm.

CONTACT

tamirtul@post.tau.ac.il

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

近年来的大量研究表明，密码子使用编码了与基因表达调控各个阶段相关的多个方面。当有相关高质量的大规模基因表达数据时，就有可能通过统计推断和建模这些信号，从而实现对基因表达的分析和工程化。然而，当这些数据不可用时，就无法推断和验证此类模型。

结果

在本研究中，我们提出了Chimera——一种无监督的计算高效方法，用于利用与开放阅读框（ORF）中基因表达编码方式相关的隐藏高维信息，该方法仅基于被分析生物体的基因组。该方法的一个版本名为Chimera平均重复子串（ChimeraARS），通过计算开放阅读框包含其编码序列中出现的长子串的倾向，来估计其对基因组（宿主）细胞内基因表达机制的适应性；第二个版本名为ChimeraMap，对蛋白质的密码子进行工程化处理，使其包含宿主编码序列中出现的密码子长子串，从而提高其对新宿主基因表达机制的适应性。我们展示了这种新方法在分析和工程化异源基因以及分析内源基因方面的适用性。具体而言，以大肠杆菌为例，我们表明它可以利用传统方法（如密码子适应指数CAI）无法检测到的信息，传统方法仅考虑单个密码子分布；例如，当CAI没有相关性时，我们报告ChimeraARS测量值与异源基因表达的相关性高达0.67。

可用性和实现方式

出于非商业目的，Chimera方法的代码可从http://www.cs.tau.ac.il/∼tamirtul/Chimera/download.htm下载。

联系方式

tamirtul@post.tau.ac.il

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

Exploiting hidden information interleaved in the redundancy of the genetic code without prior knowledge.在无需先验知识的情况下，利用交织于遗传密码冗余中的隐藏信息。

Bioinformatics. 2015 Apr 15;31(8):1161-8. doi: 10.1093/bioinformatics/btu797. Epub 2014 Nov 29.

Optimizing scaleup yield for protein production: Computationally Optimized DNA Assembly (CODA) and Translation Engineering.优化蛋白质生产的放大产量：计算优化DNA组装（CODA）和翻译工程。

Biotechnol Annu Rev. 2007;13:27-42. doi: 10.1016/S1387-2656(07)13002-7.

ChimeraUGEM: unsupervised gene expression modeling in any given organism.ChimeraUGEM：在任何给定生物体内进行无监督的基因表达建模。

Bioinformatics. 2019 Sep 15;35(18):3365-3371. doi: 10.1093/bioinformatics/btz080.

Universal evolutionary selection for high dimensional silent patterns of information hidden in the redundancy of viral genetic code.普遍进化选择高维沉默模式的信息隐藏在病毒遗传密码的冗余中。

Bioinformatics. 2018 Oct 1;34(19):3241-3248. doi: 10.1093/bioinformatics/bty351.

Rare codon clusters at 5'-end influence heterologous expression of archaeal gene in Escherichia coli.5' 端的稀有密码子簇影响古细菌基因在大肠杆菌中的异源表达。

Protein Expr Purif. 2006 Nov;50(1):49-57. doi: 10.1016/j.pep.2006.07.014. Epub 2006 Jul 29.

Translational resistivity/conductivity of coding sequences during exponential growth of Escherichia coli.

J Theor Biol. 2017 Jan 21;413:66-71. doi: 10.1016/j.jtbi.2016.11.015. Epub 2016 Nov 19.

RFMapp: ribosome flow model application.RFMapp：核糖体流模型应用。

Bioinformatics. 2012 Jun 15;28(12):1663-4. doi: 10.1093/bioinformatics/bts185. Epub 2012 Apr 11.

Exploring synonymous codon usage preferences of disulfide-bonded and non-disulfide bonded cysteines in the E. coli genome.探索大肠杆菌基因组中形成二硫键和未形成二硫键的半胱氨酸的同义密码子使用偏好。

J Theor Biol. 2006 Jul 21;241(2):390-401. doi: 10.1016/j.jtbi.2005.12.004. Epub 2006 Jan 19.

CSN: unsupervised approach for inferring biological networks based on the genome alone.CSN：一种基于基因组的无监督方法，用于推断生物网络。

BMC Bioinformatics. 2020 May 15;21(1):190. doi: 10.1186/s12859-020-3479-9.

Codon harmonization - going beyond the speed limit for protein expression.密码子优化——超越蛋白质表达的速度限制。

FEBS Lett. 2018 May;592(9):1554-1564. doi: 10.1002/1873-3468.13046. Epub 2018 Apr 16.

引用本文的文献

Predicting gene sequences with AI to study codon usage patterns.利用人工智能预测基因序列以研究密码子使用模式。

Proc Natl Acad Sci U S A. 2025 Jan 7;122(1):e2410003121. doi: 10.1073/pnas.2410003121. Epub 2024 Dec 31.

Prokaryotic rRNA-mRNA interactions are involved in all translation steps and shape bacterial transcripts.原核生物 rRNA-mRNA 相互作用涉及所有翻译步骤，并塑造细菌转录本。

RNA Biol. 2021 Nov 12;18(sup2):684-698. doi: 10.1080/15476286.2021.1978767. Epub 2021 Sep 29.

Codon-based indices for modeling gene expression and transcript evolution.用于模拟基因表达和转录本进化的基于密码子的指标。

Comput Struct Biotechnol J. 2021 Apr 22;19:2646-2663. doi: 10.1016/j.csbj.2021.04.042. eCollection 2021.

CSN: unsupervised approach for inferring biological networks based on the genome alone.CSN：一种基于基因组的无监督方法，用于推断生物网络。

BMC Bioinformatics. 2020 May 15;21(1):190. doi: 10.1186/s12859-020-3479-9.

Optimizing the dynamics of protein expression.优化蛋白质表达的动力学。

Sci Rep. 2019 May 17;9(1):7511. doi: 10.1038/s41598-019-43857-5.

Bioinformatics. 2018 Oct 1;34(19):3241-3248. doi: 10.1093/bioinformatics/bty351.

The Landscape of the Emergence of Life.生命起源的图景

Life (Basel). 2017 Jun 16;7(2):27. doi: 10.3390/life7020027.

Evidence of translation efficiency adaptation of the coding regions of the bacteriophage lambda.噬菌体λ编码区翻译效率适应性的证据。

DNA Res. 2017 Aug 1;24(4):333-342. doi: 10.1093/dnares/dsx005.

Unsupervised detection of regulatory gene expression information in different genomic regions enables gene expression ranking.在不同基因组区域中对调控基因表达信息进行无监督检测可实现基因表达排名。

BMC Bioinformatics. 2017 Feb 1;18(1):77. doi: 10.1186/s12859-017-1497-z.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

在无需先验知识的情况下，利用交织于遗传密码冗余中的隐藏信息。

Exploiting hidden information interleaved in the redundancy of the genetic code without prior knowledge.

作者信息

机构信息

出版信息

MOTIVATION

RESULTS

AVAILABILITY AND IMPLEMENTATION

CONTACT

SUPPLEMENTARY INFORMATION

动机

结果

可用性和实现方式

联系方式

补充信息

相似文献

引用本文的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献