Suppr超能文献

在无需先验知识的情况下,利用交织于遗传密码冗余中的隐藏信息。

Exploiting hidden information interleaved in the redundancy of the genetic code without prior knowledge.

作者信息

Zur Hadas, Tuller Tamir

机构信息

Department of Biomedical Engineering, The Engineering Faculty, Blavatnik School of Computer Science, Faculty of Exact Sciences and Sagol School of Neuroscience, Tel Aviv University, Tel-Aviv 69978, Israel Department of Biomedical Engineering, The Engineering Faculty, Blavatnik School of Computer Science, Faculty of Exact Sciences and Sagol School of Neuroscience, Tel Aviv University, Tel-Aviv 69978, Israel.

出版信息

Bioinformatics. 2015 Apr 15;31(8):1161-8. doi: 10.1093/bioinformatics/btu797. Epub 2014 Nov 29.

Abstract

MOTIVATION

Dozens of studies in recent years have demonstrated that codon usage encodes various aspects related to all stages of gene expression regulation. When relevant high-quality large-scale gene expression data are available, it is possible to statistically infer and model these signals, enabling analysing and engineering gene expression. However, when these data are not available, it is impossible to infer and validate such models.

RESULTS

In this current study, we suggest Chimera-an unsupervised computationally efficient approach for exploiting hidden high-dimensional information related to the way gene expression is encoded in the open reading frame (ORF), based solely on the genome of the analysed organism. One version of the approach, named Chimera Average Repetitive Substring (ChimeraARS), estimates the adaptability of an ORF to the intracellular gene expression machinery of a genome (host), by computing its tendency to include long substrings that appear in its coding sequences; the second version, named ChimeraMap, engineers the codons of a protein such that it will include long substrings of codons that appear in the host coding sequences, improving its adaptation to a new host's gene expression machinery. We demonstrate the applicability of the new approach for analysing and engineering heterologous genes and for analysing endogenous genes. Specifically, focusing on Escherichia coli, we show that it can exploit information that cannot be detected by conventional approaches (e.g. the CAI-Codon Adaptation Index), which only consider single codon distributions; for example, we report correlations of up to 0.67 for the ChimeraARS measure with heterologous gene expression, when the CAI yielded no correlation.

AVAILABILITY AND IMPLEMENTATION

For non-commercial purposes, the code of the Chimera approach can be downloaded from http://www.cs.tau.ac.il/∼tamirtul/Chimera/download.htm.

CONTACT

tamirtul@post.tau.ac.il

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

近年来的大量研究表明,密码子使用编码了与基因表达调控各个阶段相关的多个方面。当有相关高质量的大规模基因表达数据时,就有可能通过统计推断和建模这些信号,从而实现对基因表达的分析和工程化。然而,当这些数据不可用时,就无法推断和验证此类模型。

结果

在本研究中,我们提出了Chimera——一种无监督的计算高效方法,用于利用与开放阅读框(ORF)中基因表达编码方式相关的隐藏高维信息,该方法仅基于被分析生物体的基因组。该方法的一个版本名为Chimera平均重复子串(ChimeraARS),通过计算开放阅读框包含其编码序列中出现的长子串的倾向,来估计其对基因组(宿主)细胞内基因表达机制的适应性;第二个版本名为ChimeraMap,对蛋白质的密码子进行工程化处理,使其包含宿主编码序列中出现的密码子长子串,从而提高其对新宿主基因表达机制的适应性。我们展示了这种新方法在分析和工程化异源基因以及分析内源基因方面的适用性。具体而言,以大肠杆菌为例,我们表明它可以利用传统方法(如密码子适应指数CAI)无法检测到的信息,传统方法仅考虑单个密码子分布;例如,当CAI没有相关性时,我们报告ChimeraARS测量值与异源基因表达的相关性高达0.67。

可用性和实现方式

出于非商业目的,Chimera方法的代码可从http://www.cs.tau.ac.il/∼tamirtul/Chimera/download.htm下载。

联系方式

tamirtul@post.tau.ac.il

补充信息

补充数据可在《生物信息学》在线获取。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验