Department of Biological Sciences, Vanderbilt University, Nashville, TN.
Department of Biological Sciences, Vanderbilt University, Nashville, TN
Mol Biol Evol. 2017 Jan;34(1):215-229. doi: 10.1093/molbev/msw227. Epub 2016 Oct 20.
Closely spaced clusters of tandemly duplicated genes (CTDGs) contribute to the diversity of many phenotypes, including chemosensation, snake venom, and animal body plans. CTDGs have traditionally been identified subjectively as genomic neighborhoods containing several gene duplicates in close proximity; however, CTDGs are often highly variable with respect to gene number, intergenic distance, and synteny. This lack of formal definition hampers the study of CTDG evolutionary dynamics and the discovery of novel CTDGs in the exponentially growing body of genomic data. To address this gap, we developed a novel homology-based algorithm, CTDGFinder, which formalizes and automates the identification of CTDGs by examining the physical distribution of individual members of families of duplicated genes across chromosomes. Application of CTDGFinder accurately identified CTDGs for many well-known gene clusters (e.g., Hox and beta-globin gene clusters) in the human, mouse and 20 other mammalian genomes. Differences between previously annotated gene clusters and our inferred CTDGs were due to the exclusion of nonhomologs that have historically been considered parts of specific gene clusters, the inclusion or absence of genes between the CTDGs and their corresponding gene clusters, and the splitting of certain gene clusters into distinct CTDGs. Examination of human genes showing tissue-specific enhancement of their expression by CTDGFinder identified members of several well-known gene clusters (e.g., cytochrome P450s and olfactory receptors) and revealed that they were unequally distributed across tissues. By formalizing and automating CTDG identification, CTDGFinder will facilitate understanding of CTDG evolutionary dynamics, their functional implications, and how they are associated with phenotypic diversity.
紧密排列的串联重复基因簇(CTDG)为许多表型的多样性做出了贡献,包括化学感觉、蛇毒和动物体式。传统上,CTDG 是作为基因组邻域中的几个基因重复来主观识别的,这些基因重复在近距离内排列;然而,CTDG 在基因数量、基因间距离和同线性方面通常具有高度的可变性。这种缺乏正式定义的情况阻碍了 CTDG 进化动态的研究以及在不断增长的基因组数据中发现新的 CTDG。为了解决这个差距,我们开发了一种新的基于同源性的算法 CTDGFinder,它通过检查重复基因家族的各个成员在染色体上的物理分布,正式化和自动化了 CTDG 的识别。CTDGFinder 的应用准确地识别了人类、小鼠和 20 个其他哺乳动物基因组中许多著名基因簇(如 Hox 和β-珠蛋白基因簇)中的 CTDG。先前注释的基因簇和我们推断的 CTDG 之间的差异是由于排除了历史上被认为是特定基因簇一部分的非同源物,CTDG 与其相应基因簇之间基因的包含或缺失,以及某些基因簇被分为不同的 CTDG。通过 CTDGFinder 对显示其表达在组织中特异性增强的人类基因的检查,鉴定了几个著名基因簇(如细胞色素 P450 和嗅觉受体)的成员,并揭示了它们在组织中的分布不均等。通过正式化和自动化 CTDG 的识别,CTDGFinder 将有助于理解 CTDG 的进化动态、它们的功能意义以及它们与表型多样性的关系。