Artificial Intelligence Research Center, AIST, Tokyo, Japan.
Graduate School of Frontier Sciences, University of Tokyo, Chiba, Japan.
Mol Biol Evol. 2022 Apr 11;39(4). doi: 10.1093/molbev/msac068.
Genomes hold a treasure trove of protein fossils: Fragments of formerly protein-coding DNA, which mainly come from transposable elements (TEs) or host genes. These fossils reveal ancient evolution of TEs and genomes, and many fossils have been exapted to perform diverse functions important for the host's fitness. However, old and highly degraded fossils are hard to identify, standard methods (e.g. BLAST) are not optimized for this task, and few Paleozoic protein fossils have been found. Here, a recently optimized method is used to find protein fossils in vertebrate genomes. It finds Paleozoic fossils predating the amphibian/amniote divergence from most major TE categories, including virus-related Polinton and Gypsy elements. It finds 10 fossils in the human genome (eight from TEs and two from host genes) that predate the last common ancestor of all jawed vertebrates, probably from the Ordovician period. It also finds types of transposon and retrotransposon not found in human before. These fossils have extreme sequence conservation, indicating exaptation: some have evidence of gene-regulatory function, and they tend to lie nearest to developmental genes. Some ancient fossils suggest "genome tectonics," where two fragments of one TE have drifted apart by up to megabases, possibly explaining gene deserts and large introns. This paints a picture of great TE diversity in our aquatic ancestors, with patchy TE inheritance by later vertebrates, producing new genes and regulatory elements on the way. Host-gene fossils too have contributed anciently conserved DNA segments. This paves the way to further studies of ancient protein fossils.
曾经的蛋白质编码 DNA 片段,主要来自转座元件(TEs)或宿主基因。这些化石揭示了 TEs 和基因组的古老进化,许多化石已经被适应,具有宿主适应性的多样化功能。然而,古老且高度退化的化石很难识别,标准方法(如 BLAST)对此任务优化不足,且很少发现古生代蛋白质化石。在此,我们使用最近优化的方法在脊椎动物基因组中寻找蛋白质化石。它发现了古生代化石,这些化石的年代早于两栖动物/羊膜动物分化,来自大多数主要的 TE 类别,包括与病毒相关的 Polinton 和 Gypsy 元件。它在人类基因组中发现了 10 个化石(8 个来自 TEs,2 个来自宿主基因),这些化石的年代早于所有有颌脊椎动物的最后共同祖先,可能来自奥陶纪。它还发现了以前在人类中未发现的转座子和逆转录转座子类型。这些化石具有极端的序列保守性,表明了适应进化:有些具有基因调控功能的证据,它们往往位于发育基因附近。一些古老的化石表明存在“基因组构造”,即一个 TE 的两个片段可能已经漂移了多达兆碱基,这可能解释了基因荒漠和大内含子的形成。这描绘了我们水生祖先中 TEs 多样性的精彩画面,后来的脊椎动物通过不完整的 TE 遗传,在进化过程中产生了新的基因和调控元件。宿主基因的化石也为古老的保守 DNA 片段做出了贡献。这为进一步研究古蛋白质化石铺平了道路。