Shaanxi Key Laboratory of Qinling Ecological Intelligent Monitoring and Protection, School of Ecology and Environment, Northwestern Polytechnical University, Xi'an 710129, China;
Research and Development Institute of Northwestern Polytechnical University in Shenzhen, Shenzhen 518063, China.
Genome Res. 2024 Nov 20;34(11):2118-2132. doi: 10.1101/gr.279166.124.
The ability to generate multiple RNA transcript isoforms from the same gene is a general phenomenon in eukaryotes. However, the complexity and diversity of alternative isoforms in natural populations remain largely unexplored. Using a newly developed full-length transcript enrichment protocol with 5' CAP selection, we sequenced full-length RNA transcripts of 48 individuals from outbred populations and subspecies of , and from the closely related sister species and as outgroups. The data set represents the most extensive full-length high-quality isoform catalog at the population level to date. In total, we reliably identify 117,728 distinct isoforms, of which only 51% were previously annotated. We show that the population-specific distribution pattern of isoforms is phylogenetically informative and reflects the segregating single nucleotide polymorphism (SNP) diversity between the populations. We find that ancient housekeeping genes are a major source of the overall isoform diversity, and that the generation of alternative first exons plays a major role in generating new isoforms. Given that our data allow us to distinguish between population-specific isoforms and isoforms that are conserved across multiple populations, it is possible to refine the annotation of the reference mouse genome to a set of about 40,000 isoforms that should be most relevant for comparative functional analysis across species.
从同一个基因产生多种 RNA 转录本异构体是真核生物的普遍现象。然而,天然种群中替代异构体的复杂性和多样性在很大程度上仍未得到探索。我们使用新开发的全长转录本富集方案和 5' CAP 选择,对来自杂交种群和亚种的 48 个个体以及作为外群的密切相关的姐妹种 和 进行了全长 RNA 转录本测序。该数据集代表了迄今为止在群体水平上最广泛的全长高质量异构体目录。总共,我们可靠地鉴定了 117728 个独特的异构体,其中只有 51%是以前注释的。我们表明,异构体的种群特异性分布模式具有系统发育信息,并反映了种群之间分离的单核苷酸多态性(SNP)多样性。我们发现,古老的管家基因是整体异构体多样性的主要来源,而替代的第一个外显子的产生在产生新异构体方面起着主要作用。鉴于我们的数据允许我们区分特定于种群的异构体和在多个种群中保守的异构体,因此可以将参考小鼠基因组的注释细化为大约 40000 个异构体,这对于跨物种的比较功能分析应该是最相关的。