Keese P K, Gibbs A
Commonwealth Scientific and Industrial Organisation, Division of Plant Industry, Australian National University, Canberra.
Proc Natl Acad Sci U S A. 1992 Oct 15;89(20):9489-93. doi: 10.1073/pnas.89.20.9489.
Many protein families are common to all cellular organisms, indicating that many genes have ancient origins. Genetic variation is mostly attributed to processes such as mutation, duplication, and rearrangement of ancient modules. Thus it is widely assumed that much of present-day genetic diversity can be traced by common ancestry to a molecular "big bang." A rarely considered alternative is that proteins may arise continuously de novo. One mechanism of generating different coding sequences is by "overprinting," in which an existing nucleotide sequence is translated de novo in a different reading frame or from noncoding open reading frames. The clearest evidence for overprinting is provided when the original gene function is retained, as in overlapping genes. Analysis of their phylogenies indicates which are the original genes and which are their informationally novel partners. We report here the phylogenetic relationships of overlapping coding sequences from steroid-related receptor genes and from tymovirus, luteovirus, and lentivirus genomes. For each pair of overlapping coding sequences, one is confined to a single lineage, whereas the other is more widespread. This suggests that the phylogenetically restricted coding sequence arose only in the progenitor of that lineage by translating an out-of-frame sequence to yield the new polypeptide. The production of novel exons by alternative splicing in thyroid receptor and lentivirus genes suggests that introns can be a valuable evolutionary source for overprinting. New genes and their products may drive major evolutionary changes.
许多蛋白质家族在所有细胞生物中都很常见,这表明许多基因有着古老的起源。遗传变异主要归因于诸如古代模块的突变、复制和重排等过程。因此,人们普遍认为,当今的许多遗传多样性可以通过共同祖先追溯到一个分子“大爆炸”。一个很少被考虑的替代观点是,蛋白质可能会不断地从头产生。产生不同编码序列的一种机制是“重叠编码”,即现有的核苷酸序列在不同的阅读框中或从非编码开放阅读框中重新进行翻译。当原始基因功能得以保留时,如在重叠基因中,就为重叠编码提供了最清晰的证据。对它们系统发育的分析表明哪些是原始基因,哪些是在信息上全新的伙伴。我们在此报告了来自类固醇相关受体基因以及芜菁黄花叶病毒、黄症病毒和慢病毒基因组的重叠编码序列的系统发育关系。对于每一对重叠编码序列,其中一个局限于单一谱系,而另一个则分布更广。这表明,系统发育上受限的编码序列仅在该谱系的祖细胞中通过翻译一个移码序列以产生新的多肽而出现。甲状腺受体和慢病毒基因中通过可变剪接产生新外显子,这表明内含子可能是重叠编码的一个有价值的进化来源。新基因及其产物可能推动重大的进化变化。