Department of Biochemistry, University College Cork, Cork, Ireland.
Mol Biol Evol. 2011 Nov;28(11):3195-211. doi: 10.1093/molbev/msr155. Epub 2011 Jun 14.
Bacterial genome annotations contain a number of coding sequences (CDSs) that, in spite of reading frame disruptions, encode a single continuous polypeptide. Such disruptions have different origins: sequencing errors, frameshift, or stop codon mutations, as well as instances of utilization of nontriplet decoding. We have extracted over 1,000 CDSs with annotated disruptions and found that about 75% of them can be clustered into 64 groups based on sequence similarity. Analysis of the clusters revealed deep phylogenetic conservation of open reading frame organization as well as the presence of conserved sequence patterns that indicate likely utilization of the nonstandard decoding mechanisms: programmed ribosomal frameshifting (PRF) and programmed transcriptional realignment (PTR). Further enrichment of these clusters with additional homologous nucleotide sequences revealed over 6,000 candidate genes utilizing PRF or PTR. Analysis of the patterns of conservation apparently associated with nontriplet decoding revealed the presence of both previously characterized frameshift-prone sequences and a few novel ones. Since the starting point of our analysis was a set of genes with already annotated disruptions, it is highly plausible that in this study, we have identified only a fraction of all bacterial genes that utilize PRF or PTR. In addition to the identification of a large number of recoded genes, a surprising observation is that nearly half of them are expressed via PTR-a mechanism that, in contrast to PRF, has not yet received substantial attention.
细菌基因组注释包含许多编码序列 (CDS),尽管它们的阅读框受到了破坏,但仍能编码一个连续的多肽。这些破坏有不同的来源:测序错误、移码或终止密码子突变,以及非三联体解码的利用实例。我们提取了 1000 多个带有注释破坏的 CDS,并发现它们中的大约 75%可以根据序列相似性聚类为 64 个组。对这些聚类的分析揭示了开放阅读框组织的深刻系统发育保守性,以及存在保守的序列模式,这表明可能利用了非标准解码机制:核糖体移码 (PRF) 和转录重排 (PTR)。通过将这些聚类与额外的同源核苷酸序列进一步富集,发现了超过 6000 个利用 PRF 或 PTR 的候选基因。与非三联体解码相关的明显保守模式的分析显示,既有以前表征的易移码序列,也有少数新序列。由于我们分析的起点是一组已经注释了破坏的基因,因此非常有可能在这项研究中,我们只识别了所有利用 PRF 或 PTR 的细菌基因的一小部分。除了鉴定出大量的重编码基因外,一个令人惊讶的观察结果是,它们中有近一半是通过 PTR 表达的——与 PRF 不同,PTR 尚未受到广泛关注。