Hubei Hongshan Laboratory, College of Biomedicine and Health, Huazhong Agricultural University, Wuhan, 430070, PR China.
Hubei Key Laboratory of Agricultural Bioinformatics, College of Life Science and Technology, Huazhong Agricultural University, Wuhan, 430070, PR China.
Genome Biol. 2022 Nov 8;23(1):235. doi: 10.1186/s13059-022-02802-y.
Pseudogenes are excellent markers for genome evolution, which are emerging as crucial regulators of development and disease, especially cancer. However, systematic functional characterization and evolution of pseudogenes remain largely unexplored.
To systematically characterize pseudogenes, we date the origin of human and mouse pseudogenes across vertebrates and observe a burst of pseudogene gain in these two lineages. Based on a hybrid sequencing dataset combining full-length PacBio sequencing, sample-matched Illumina sequencing, and public time-course transcriptome data, we observe that abundant mammalian pseudogenes could be transcribed, which contribute to the establishment of organ identity. Our analyses reveal that developmentally dynamic pseudogenes are evolutionarily conserved and show an increasing weight during development. Besides, they are involved in complex transcriptional and post-transcriptional modulation, exhibiting the signatures of functional enrichment. Coding potential evaluation suggests that 19% of human pseudogenes could be translated, thus serving as a new way for protein innovation. Moreover, pseudogenes carry disease-associated SNPs and conduce to cancer transcriptome perturbation.
Our discovery reveals an unexpectedly high abundance of mammalian pseudogenes that can be transcribed and translated, and these pseudogenes represent a novel regulatory layer. Our study also prioritizes developmentally dynamic pseudogenes with signatures of functional enrichment and provides a hybrid sequencing dataset for further unraveling their biological mechanisms in organ development and carcinogenesis in the future.
假基因是基因组进化的极佳标志物,它们正成为发育和疾病(尤其是癌症)的关键调控因子。然而,假基因的系统功能特征和进化在很大程度上仍未得到探索。
为了系统地描述假基因,我们追溯了人类和小鼠假基因在整个脊椎动物中的起源,并观察到这两个谱系中假基因的大量获得。基于结合全长 PacBio 测序、样本匹配的 Illumina 测序和公共时间过程转录组数据的混合测序数据集,我们观察到丰富的哺乳动物假基因可以被转录,这有助于建立器官的身份。我们的分析表明,发育动态假基因在进化上是保守的,并在发育过程中显示出越来越重要的作用。此外,它们参与了复杂的转录和转录后调节,表现出功能富集的特征。编码潜力评估表明,19%的人类假基因可能被翻译,从而为蛋白质创新提供了一种新途径。此外,假基因携带与疾病相关的 SNP,并导致癌症转录组的扰动。
我们的发现揭示了出人意料的大量可以转录和翻译的哺乳动物假基因,这些假基因代表了一个新的调控层。我们的研究还优先考虑具有功能富集特征的发育动态假基因,并提供了一个混合测序数据集,以便在未来进一步揭示它们在器官发育和致癌过程中的生物学机制。