Dhillon Braham, Gill Navdeep, Hamelin Richard C, Goodwin Stephen B
Department of Forest and Conservation Sciences, 2424 Main Mall, Vancouver, BC, V6T 1Z4, Canada.
Department of Botany, Beaty Biodiversity Centre, 2212 Main Mall, Vancouver, BC, V6T 1Z4, Canada.
BMC Genomics. 2014 Dec 17;15(1):1132. doi: 10.1186/1471-2164-15-1132.
In addition to gene identification and annotation, repetitive sequence analysis has become an integral part of genome sequencing projects. Identification of repeats is important not only because it improves gene prediction, but also because of the role that repetitive sequences play in determining the structure and evolution of genes and genomes. Several methods using different repeat-finding strategies are available for whole-genome repeat sequence analysis. Four independent approaches were used to identify and characterize the repetitive fraction of the Mycosphaerella graminicola (synonym Zymoseptoria tritici) genome. This ascomycete fungus is a wheat pathogen and its finished genome comprises 21 chromosomes, eight of which can be lost with no obvious effects on fitness so are dispensable.
Using a combination of four repeat-finding methods, at least 17% of the M. graminicola genome was estimated to be repetitive. Class I transposable elements, that amplify via an RNA intermediate, account for about 70% of the total repetitive content in the M. graminicola genome. The dispensable chromosomes had a higher percentage of repetitive elements as compared to the core chromosomes. Distribution of repeats across the chromosomes also varied, with at least six chromosomes showing a non-random distribution of repetitive elements. Repeat families showed transition mutations and a CpA → TpA dinucleotide bias, indicating the presence of a repeat-induced point mutation (RIP)-like mechanism in M. graminicola. One gene family and two repeat families specific to subtelomeres also were identified in the M. graminicola genome. A total of 78 putative clusters of nested elements was found in the M. graminicola genome. Several genes with putative roles in pathogenicity were found associated with these nested repeat clusters. This analysis of the transposable element content in the finished M. graminicola genome resulted in a thorough and highly curated database of repetitive sequences.
This comprehensive analysis will serve as a scaffold to address additional biological questions regarding the origin and fate of transposable elements in fungi. Future analyses of the distribution of repetitive sequences in M. graminicola also will be able to provide insights into the association of repeats with genes and their potential role in gene and genome evolution.
除了基因鉴定和注释外,重复序列分析已成为基因组测序项目不可或缺的一部分。重复序列的鉴定不仅重要,因为它能改进基因预测,还因为重复序列在决定基因和基因组的结构与进化中所起的作用。有几种使用不同重复序列查找策略的方法可用于全基因组重复序列分析。我们采用了四种独立的方法来鉴定和表征小麦壳针孢(同义词:小麦叶枯病菌)基因组中的重复部分。这种子囊菌是一种小麦病原体,其完整基因组由21条染色体组成,其中8条染色体缺失后对适应性无明显影响,因此是可有可无的。
通过结合四种重复序列查找方法,估计至少17%的小麦壳针孢基因组是重复的。通过RNA中间体进行扩增的I类转座元件约占小麦壳针孢基因组总重复含量的70%。与核心染色体相比,可有可无的染色体具有更高比例的重复元件。重复序列在染色体上的分布也各不相同,至少有六条染色体显示出重复元件的非随机分布。重复序列家族表现出转换突变和CpA→TpA二核苷酸偏向性,表明小麦壳针孢中存在类似重复序列诱导点突变(RIP)的机制。在小麦壳针孢基因组中还鉴定出了一个特定于亚端粒的基因家族和两个重复序列家族。在小麦壳针孢基因组中总共发现了78个假定的嵌套元件簇。发现了几个与这些嵌套重复序列簇相关的、具有假定致病作用的基因。对已完成测序的小麦壳针孢基因组中的转座元件含量进行的分析产生了一个全面且经过高度整理的重复序列数据库。
这种全面的分析将作为一个框架,用于解决关于真菌中转座元件的起源和命运的其他生物学问题。未来对小麦壳针孢中重复序列分布的分析也将能够深入了解重复序列与基因的关联及其在基因和基因组进化中的潜在作用。