Suppr超能文献

重叠基因的组合学

The combinatorics of overlapping genes.

作者信息

Lèbre Sophie, Gascuel Olivier

机构信息

IMAG, UMR 5149 - CNRS & Université de Montpellier, France; CPBS, UMR 5236 - CNRS & Université de Montpellier, France; Université Paul Valéry Montpellier 3, France; Institut de Biologie Computationnelle, LIRMM, UMR 5506 - CNRS & Université de Montpellier, France.

Institut de Biologie Computationnelle, LIRMM, UMR 5506 - CNRS & Université de Montpellier, France; Unité Bioinformatique Evolutive, C3BI, USR 3756 - Institut Pasteur & CNRS, Paris, France.

出版信息

J Theor Biol. 2017 Feb 21;415:90-101. doi: 10.1016/j.jtbi.2016.09.018. Epub 2016 Oct 11.

Abstract

Overlapping genes exist in all domains of life and are much more abundant than expected upon their first discovery in the late 1970s. Assuming that the reference gene is read in frame +0, an overlapping gene can be encoded in two reading frames in the sense strand, denoted by +1 and +2, and in three reading frames in the opposite strand, denoted by -0, -1, and -2. This motivated numerous researchers to study the constraints induced by the genetic code on the various overlapping frames, mostly based on information theory. Our focus in this paper is on the constraints induced on two overlapping genes in terms of amino acids, as well as polypeptides. We show that simple linear constraints bind the amino-acid composition of two proteins encoded by overlapping genes. Novel constraints are revealed when polypeptides are considered, and not just single amino acids. For example, in double-coding sequences with an overlapping reading frame -2, each Tyrosine (denoted as Tyr or Y) in the overlapping frame overlaps a Tyrosine in the reference frame +0 (and reciprocally), whereas specific words (e.g. YY) never occur. We thus distinguish between null constraints (YY = 0 in frame -2) and non-null constraints (Y in frame +0 ⇔ Y in frame -2). Our equivalence-based constraints are symmetrical and thus enable the characterization of the joint composition of overlapping proteins. We describe several formal frameworks and a graph algorithm to characterize and compute these constraints. As expected, the degrees of freedom left by these constraints vary drastically among the different overlapping frames. Interestingly, the biological meaning of constraints induced on two overlapping proteins (hydropathy, forbidden di-peptides, expected overlap length …) is also specific to the reading frame. We study the combinatorics of these constraints for overlapping polypeptides of length n, pointing out that, (i) except for frame -2, non-null constraints are deduced from the amino-acid (length = 1) constraints and (ii) null constraints are deduced from the di-peptide (length = 2) constraints. These results yield support for understanding the mechanisms and evolution of overlapping genes, and for developing novel overlapping gene detection methods.

摘要

重叠基因存在于生命的所有领域,其数量比20世纪70年代末首次发现时预期的要多得多。假设参考基因在+0读框中被读取,那么重叠基因可以在有义链的两个读框中编码,分别用+1和+2表示,在反义链的三个读框中编码,分别用-0、-1和-2表示。这促使众多研究人员主要基于信息论来研究遗传密码对各种重叠读框所施加的限制。本文我们关注的是在氨基酸以及多肽方面对两个重叠基因所施加的限制。我们表明,简单的线性限制约束了由重叠基因编码的两种蛋白质的氨基酸组成。当考虑多肽而非单个氨基酸时,会揭示出新的限制。例如,在具有重叠读框-2的双编码序列中,重叠读框中的每个酪氨酸(表示为Tyr或Y)与参考读框+0中的一个酪氨酸重叠(反之亦然),而特定的单词(如YY)从不出现。因此,我们区分空限制(读框-2中YY = 0)和非空限制(读框+0中的Y ⇔ 读框-2中的Y)。我们基于等价性的限制是对称的,因此能够表征重叠蛋白质的联合组成。我们描述了几种形式框架和一种图算法来表征和计算这些限制。正如预期的那样,这些限制所留下的自由度在不同的重叠读框中差异很大。有趣的是,对两个重叠蛋白质所施加限制的生物学意义(亲水性、禁止的二肽、预期的重叠长度……)也特定于读框。我们研究了长度为n的重叠多肽这些限制的组合学,指出:(i)除了读框-2之外,非空限制是从氨基酸(长度=1)限制推导出来的;(ii)空限制是从二肽(长度=2)限制推导出来的。这些结果为理解重叠基因的机制和进化以及开发新的重叠基因检测方法提供了支持。

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验