Shi Sheng-Lin, Li Dan-Tong, Liu Yan-Qun
College of Bioscience and Biotechnology, Shenyang Agricultural University, Shenyang 110866, China.
Genes (Basel). 2025 Jul 17;16(7):833. doi: 10.3390/genes16070833.
The mammalian mitochondrial genome has long been considered to encode only 13 proteins. However, a recent study identified a nested alternative open reading frame (nAltORF) within the primate mitochondrial gene, which we designate , that is reportedly translated in the cytosol using the standard genetic code. This discovery challenges conventional understanding and raises questions about the prevalence, conservation, and translational adaptation of such ORFs.
This study conducted a comprehensive bioinformatic analysis of nested genes in 289 primate and 380 rodent mitochondrial sequences.
Nested genes meeting the criteria (>150 codons, standard genetic code) were identified in only 10.73% of primate and 20.53% of rodent species, suggesting a patchy phylogenetic distribution. While their encoded proteins showed homology to the previously reported protein encoded by the nested gene, overall amino acid conservation was low, and characteristic protein domains or signal peptides were generally not predicted. Crucially, the Kozak consensus sequences surrounding the putative start codons of these genes were exclusively "weak" or "adequate", with none classified as "strong" or "optimal". Codon Adaptation Index (CAI) and Relative Codon Deoptimization Index (RCDI) analyses of the nested genes revealed neither significant adaptation nor deoptimization to the codon usage of nuclear and mitochondrial genes. Furthermore, cosine similarity analysis indicated that genes exhibit significantly lower codon usage similarity to both nuclear and mitochondrial gene sets compared to their host genes.
These findings collectively suggest that while genes exist in some mammals, their inconsistent presence, weak translational initiation signals, and lack of adaptation to cytosolic codon usage characterize them as dispensable genetic elements rather than core functional genes.
长期以来,哺乳动物线粒体基因组一直被认为仅编码13种蛋白质。然而,最近一项研究在灵长类动物线粒体基因中发现了一个嵌套的可变开放阅读框(nAltORF),我们将其命名为 ,据报道该阅读框在细胞质中使用标准遗传密码进行翻译。这一发现挑战了传统认知,并引发了关于此类开放阅读框的普遍性、保守性和翻译适应性的问题。
本研究对289种灵长类动物和380种啮齿动物线粒体 序列中的嵌套基因进行了全面的生物信息学分析。
仅在10.73%的灵长类动物和20.53%的啮齿动物物种中鉴定出符合标准(>150个密码子,标准遗传密码)的嵌套基因,这表明其系统发育分布并不连续。虽然它们编码的蛋白质与先前报道的由 嵌套基因编码的蛋白质显示出同源性,但总体氨基酸保守性较低,并且通常无法预测其特征性蛋白质结构域或信号肽。至关重要的是,这些基因推定起始密码子周围的科扎克共有序列均为“弱”或“适当”,没有一个被归类为“强”或“最佳”。对嵌套基因的密码子适应指数(CAI)和相对密码子去优化指数(RCDI)分析表明,它们既没有对核基因和线粒体基因的密码子使用进行显著适应,也没有去优化。此外,余弦相似性分析表明,与它们的宿主基因相比,基因与核基因集和线粒体基因集的密码子使用相似性显著更低。
这些发现共同表明,虽然基因在一些哺乳动物中存在,但其存在的不一致性、较弱的翻译起始信号以及对细胞质密码子使用缺乏适应性,表明它们是可有可无的遗传元件,而非核心功能基因。