Department of Molecular and Translational Medicine, Paul L. Foster School of Medicine, Texas Tech Health University Health Sciences Center, El Paso, TX, 79905, USA.
Graduate School, College of Science, University of Texas at El Paso, El Paso, TX, 79902, USA.
BMC Genomics. 2020 Jan 31;21(1):113. doi: 10.1186/s12864-020-6506-3.
Recent advances in genetics and genomics present unique opportunities for enhancing our understanding of mammalian biology and evolution through detailed multi-species comparative analysis of gene organization and expression. Yet, of the more than 20,000 protein coding genes found in mammalian genomes, fewer than 10% have been examined in any detail. Here we elucidate the power of data available in publicly-accessible genomic and genetic resources by querying them to evaluate Zmat2, a minimally studied gene whose human ortholog has been implicated in spliceosome function and in keratinocyte differentiation.
We find extensive conservation in coding regions and overall structure of Zmat2 in 18 mammals representing 13 orders and spanning ~ 165 million years of evolutionary development, and in their encoded proteins. We identify a tandem duplication in the Zmat2 gene and locus in opossum, but not in other monotremes, marsupials, or other mammals, indicating that this event occurred subsequent to the divergence of these species from one another. We also define a collection of Zmat2 pseudogenes in half of the mammals studied, and suggest based on phylogenetic analysis that they each arose independently in the recent evolutionary past.
Mammalian Zmat2 genes and ZMAT2 proteins illustrate conservation of structure and sequence, along with the development and diversification of pseudogenes in a large fraction of species. Collectively, these observations also illustrate how the focused identification and interpretation of data found in public genomic and gene expression resources can be leveraged to reveal new insights of potentially high biological significance.
遗传学和基因组学的最新进展为通过对基因组织和表达的详细多物种比较分析来增强我们对哺乳动物生物学和进化的理解提供了独特的机会。然而,在哺乳动物基因组中发现的超过 20000 个蛋白质编码基因中,不到 10%被详细研究过。在这里,我们通过查询可公开访问的基因组和遗传资源中的数据来阐明可用数据的强大功能,以评估 Zmat2,这是一个研究较少的基因,其人类同源物已被牵连在剪接体功能和角质细胞分化中。
我们发现 18 种哺乳动物的编码区域和 Zmat2 的整体结构在 13 个目之间具有广泛的保守性,跨越了约 1.65 亿年的进化发展,并且在它们编码的蛋白质中也是如此。我们在负鼠中发现了 Zmat2 基因和基因座的串联重复,但在其他单孔目动物、有袋动物或其他哺乳动物中没有发现,这表明这一事件发生在这些物种彼此分化之后。我们还在一半研究的哺乳动物中定义了一组 Zmat2 假基因,并根据系统发育分析表明,它们都是在最近的进化过程中独立产生的。
哺乳动物 Zmat2 基因和 ZMAT2 蛋白说明了结构和序列的保守性,以及大量物种中假基因的发展和多样化。总的来说,这些观察结果还说明了如何利用公共基因组和基因表达资源中发现的数据的有针对性的识别和解释来揭示具有潜在高生物学意义的新见解。