Department of Biology and Molecular Biology, Montclair State University, Montclair, NJ 07043;
Waksman Institute, Rutgers, the State University of New Jersey, Piscataway, NJ 08854;
Proc Natl Acad Sci U S A. 2014 Jul 15;111(28):10263-8. doi: 10.1073/pnas.1410068111. Epub 2014 Jun 30.
Transposons make up the bulk of eukaryotic genomes, but are difficult to annotate because they evolve rapidly. Most of the unannotated portion of sequenced genomes is probably made up of various divergent transposons that have yet to be categorized. Helitrons are unusual rolling circle eukaryotic transposons that often capture gene sequences, making them of considerable evolutionary importance. Unlike other DNA transposons, Helitrons do not end in inverted repeats or create target site duplications, so they are particularly challenging to identify. Here we present HelitronScanner, a two-layered local combinational variable (LCV) tool for generalized Helitron identification that represents a major improvement over previous identification programs based on DNA sequence or structure. HelitronScanner identified 64,654 Helitrons from a wide range of plant genomes in a highly automated way. We tested HelitronScanner's predictive ability in maize, a species with highly heterogeneous Helitron elements. LCV scores for the 5' and 3' termini of the predicted Helitrons provide a primary confidence level and element copy number provides a secondary one. Newly identified Helitrons were validated by PCR assays or by in silico comparative analysis of insertion site polymorphism among multiple accessions. Many new Helitrons were identified in model species, such as maize, rice, and Arabidopsis, and in a variety of organisms where Helitrons had not been reported previously to our knowledge, leading to a major upward reassessment of their abundance in plant genomes. HelitronScanner promises to be a valuable tool in future comparative and evolutionary studies of this major transposon superfamily.
转座子构成了真核生物基因组的大部分,但由于它们进化迅速,因此难以注释。已测序基因组中未注释的大部分可能是由各种尚未分类的不同转座子组成的。Helitrons 是一种不寻常的滚环真核转座子,它经常捕获基因序列,因此具有相当重要的进化意义。与其他 DNA 转座子不同,Helitrons 末端没有反向重复序列,也不会产生靶序列重复,因此特别难以识别。在这里,我们提出了 HelitronScanner,这是一种用于广义 Helitron 识别的双层局部组合变量 (LCV) 工具,与基于 DNA 序列或结构的先前识别程序相比有了重大改进。HelitronScanner 以高度自动化的方式从广泛的植物基因组中鉴定出 64654 个 Helitrons。我们在玉米中测试了 HelitronScanner 的预测能力,玉米是一种具有高度异质 Helitron 元件的物种。预测 Helitrons 的 5'和 3'末端的 LCV 得分提供了主要的置信水平,而元件拷贝数提供了次要的置信水平。通过 PCR 检测或在多个品系之间插入位点多态性的计算机比较分析对新鉴定的 Helitrons 进行验证。在模型物种(如玉米、水稻和拟南芥)以及在许多以前我们不知道有 Helitrons 的生物体中鉴定到了许多新的 Helitrons,这导致对植物基因组中 Helitrons 丰度的重新评估大大提高。HelitronScanner 有望成为未来对这种主要转座子超家族进行比较和进化研究的有价值的工具。