Kronmiller Brent A, Wise Roger P
Bioinformatics and Computational Biology, Iowa State University, Ames, Iowa 50011-1020, USA.
Plant Physiol. 2009 Oct;151(2):483-95. doi: 10.1104/pp.109.143370. Epub 2009 Aug 12.
The architecture of grass genomes varies on multiple levels. Large long terminal repeat retrotransposon clusters occupy significant portions of the intergenic regions, and islands of protein-encoding genes are interspersed among the repeat clusters. Hence, advanced assembly techniques are required to obtain completely finished genomes as well as to investigate gene and transposable element distributions. To characterize the organization and distribution of repeat clusters and gene islands across large grass genomes, we present 961- and 594-kb contiguous sequence contigs associated with the rf1 (for restorer of fertility1) locus in the near-centromeric region of maize (Zea mays) chromosome 3. We present two methods for computational finishing of highly repetitive bacterial artificial chromosome clones that have proved successful to close all sequence gaps caused by transposable element insertions. Sixteen repeat clusters were observed, ranging in length from 23 to 155 kb. These repeat clusters are almost exclusively long terminal repeat retrotransposons, of which the paleontology of insertion varies throughout the cluster. Gene islands contain from one to four predicted genes, resulting in a gene density of one gene per 16 kb in gene islands and one gene per 111 kb over the entire sequenced region. The two sequence contigs, when compared with the rice (Oryza sativa) and sorghum (Sorghum bicolor) genomes, retain gene colinearity of 50% and 71%, respectively, and 70% and 100%, respectively, for high-confidence gene models. Collinear genes on single gene islands show that while most expansion of the maize genome has occurred in the repeat clusters, gene islands are not immune and have experienced growth in both intragene and intergene locations.
禾本科植物基因组的结构在多个层面上存在差异。大型长末端重复反转录转座子簇占据了基因间区域的很大一部分,而蛋白质编码基因岛则散布在重复簇之间。因此,需要先进的组装技术来获得完全完成的基因组,并研究基因和转座元件的分布。为了表征大型禾本科植物基因组中重复簇和基因岛的组织与分布,我们展示了与玉米(Zea mays)第3号染色体近着丝粒区域的rf1(育性恢复基因1)位点相关的961 kb和594 kb连续序列重叠群。我们提出了两种用于对高度重复的细菌人工染色体克隆进行计算完成的方法,这些方法已被证明成功地填补了由转座元件插入导致的所有序列缺口。观察到16个重复簇,长度从23 kb到155 kb不等。这些重复簇几乎完全是长末端重复反转录转座子,其插入的古生物学在整个簇中各不相同。基因岛包含1至4个预测基因,基因岛中的基因密度为每16 kb一个基因,而在整个测序区域中为每111 kb一个基因。当将这两个序列重叠群与水稻(Oryza sativa)和高粱(Sorghum bicolor)基因组进行比较时,对于高可信度基因模型,分别保留了50%和71%以及70%和100%的基因共线性。单个基因岛上的共线基因表明,虽然玉米基因组的大多数扩增发生在重复簇中,但基因岛也未能幸免,在基因内和基因间位置都经历了增长。