Genome Sequencing Center, HudsonAlpha Institute for Biotechnology, Huntsville, United States.
Joint Genome Institute, Lawrence Berkeley National Laboratory, Berkeley, United States.
Elife. 2022 Sep 9;11:e78526. doi: 10.7554/eLife.78526.
The development of multiple chromosome-scale reference genome sequences in many taxonomic groups has yielded a high-resolution view of the patterns and processes of molecular evolution. Nonetheless, leveraging information across multiple genomes remains a significant challenge in nearly all eukaryotic systems. These challenges range from studying the evolution of chromosome structure, to finding candidate genes for quantitative trait loci, to testing hypotheses about speciation and adaptation. Here, we present GENESPACE, which addresses these challenges by integrating conserved gene order and orthology to define the expected physical position of all genes across multiple genomes. We demonstrate this utility by dissecting presence-absence, copy-number, and structural variation at three levels of biological organization: spanning 300 million years of vertebrate sex chromosome evolution, across the diversity of the Poaceae (grass) plant family, and among 26 maize cultivars. The methods to build and visualize syntenic orthology in the GENESPACE R package offer a significant addition to existing gene family and synteny programs, especially in polyploid, outbred, and other complex genomes.
在许多分类群中,多个染色体级别的参考基因组序列的发展提供了对分子进化模式和过程的高分辨率观察。尽管如此,在几乎所有真核生物系统中,跨多个基因组利用信息仍然是一个重大挑战。这些挑战范围从研究染色体结构的进化,到寻找数量性状位点的候选基因,再到检验关于物种形成和适应的假说。在这里,我们提出了 GENESPACE,它通过整合保守的基因顺序和同源性来定义多个基因组中所有基因的预期物理位置,从而解决了这些挑战。我们通过剖析三个生物学组织层次的存在缺失、拷贝数和结构变异来证明这种效用:跨越了 3 亿年的脊椎动物性染色体进化,跨越了禾本科(草)植物家族的多样性,以及在 26 个玉米品种之间。在 GENESPACE R 包中构建和可视化同线性同源性的方法为现有基因家族和同线性程序提供了重要补充,特别是在多倍体、杂交和其他复杂基因组中。