Donald Danforth Plant Science Center, St. Louis, 63132 Missouri.
Donald Danforth Plant Science Center, St. Louis, 63132 Missouri
Plant Cell. 2020 Sep;32(9):2687-2698. doi: 10.1105/tpc.20.00115. Epub 2020 Jul 9.
Transcript-based annotations of genes facilitate both genome-wide analyses and detailed single-locus research. In contrast, transposable element (TE) annotations are rudimentary, consisting of information only on TE location and type. The repetitiveness and limited annotation of TEs prevent the ability to distinguish between potentially functional expressed elements and degraded copies. To improve genome-wide TE bioinformatics, we performed long-read sequencing of cDNAs from Arabidopsis () lines deficient in multiple layers of TE repression. These uniquely mapping transcripts were used to identify the set of TEs able to generate polyadenylated RNAs and create a new transcript-based annotation of TEs that we have layered upon the existing high-quality community standard annotation. We used this annotation to reduce the bioinformatic complexity associated with multimapping reads from short-read RNA sequencing experiments, and we show that this improvement is expanded in a TE-rich genome such as maize (). Our TE annotation also enables the testing of specific standing hypotheses in the TE field. We demonstrate that inaccurate TE splicing does not trigger small RNA production, and the cell more strongly targets DNA methylation to TEs that have the potential to make mRNAs. This work provides a transcript-based TE annotation for Arabidopsis and maize, which serves as a blueprint to reduce the bioinformatic complexity associated with repetitive TEs in any organism.
基于转录本的基因注释有助于进行全基因组分析和详细的单基因研究。相比之下,转座元件 (TE) 的注释还很基础,仅包含 TE 位置和类型的信息。TE 的重复性和有限的注释妨碍了区分潜在功能表达元件和降解拷贝的能力。为了改进全基因组 TE 生物信息学,我们对拟南芥 () 缺失多层 TE 抑制的品系进行了长读长 cDNA 测序。这些唯一映射的转录本被用于鉴定能够产生多聚腺苷酸化 RNA 的 TE 集,并创建了一个新的基于转录本的 TE 注释,我们将其叠加在现有的高质量社区标准注释之上。我们使用这种注释来减少来自短读长 RNA 测序实验的多映射读的生物信息学复杂性,并且我们表明,这种改进在像玉米 () 这样的 TE 丰富的基因组中得到了扩展。我们的 TE 注释还可以测试 TE 领域中的特定现有假说。我们证明不准确的 TE 剪接不会触发小 RNA 的产生,并且细胞更强烈地将 DNA 甲基化靶向具有产生 mRNA 潜力的 TE。这项工作为拟南芥和玉米提供了基于转录本的 TE 注释,为任何生物体中与重复 TE 相关的生物信息学复杂性提供了蓝图。