Department of Pathology, Dunedin School of Medicine, University of Otago, Dunedin, New Zealand.
Maurice Wilkins Centre for Molecular Biodiscovery, Auckland, New Zealand.
Curr Protoc. 2021 Aug;1(8):e206. doi: 10.1002/cpz1.206.
Transposable elements (TEs) are key regulators of both development and disease; however, their repetitive nature presents substantial computational challenges to their analysis. Due to a lack of computational tools and suitable analysis frameworks, TE expression is often not quantified at the locus level. Therefore, we have developed RepExpress, a novel pipeline that enables locus-level TE quantification and characterization. RepExpress enables the characterization of TE expression in a genomic context, and is the first tool focusing on the identification of tissue-specific TE-derived and TE-regulated genes. RepExpress identifies expressed TEs overlapping with annotated genomic features and enables tissue-specific profiles of TE-derived genes. TEs that are expressed with no overlap with any known genomic features are characterized by the closest downstream genomic feature enabling identification of novel TE-gene regulatory relationships. RepExpress takes standard RNA-seq data as input and performs genomic alignment optimized for TEs. Our novel pipeline quantifies expression of both TEs and genes using featureCounts and Stringtie, respectively. RepExpress then filters expressed repeats and characterizes their genomic context, enabling the identification of TEs that overlap with genes, or that may be influencing gene expression. Here, we describe RepExpress, and provide a step-by-step protocol detailing its workflow. We also discuss other TE analysis tools and their applicability to addressing different biological questions. © 2021 Wiley Periodicals LLC. Basic Protocol: RepExpress workflow.
可转座元件 (TEs) 是发育和疾病的关键调节因子;然而,由于其重复性质,对它们的分析提出了巨大的计算挑战。由于缺乏计算工具和合适的分析框架,通常无法在基因座水平上定量 TE 的表达。因此,我们开发了 RepExpress,这是一种新的管道,可以实现基因座水平 TE 的定量和特征描述。RepExpress 能够在基因组背景下对 TE 的表达进行特征描述,是第一个专注于鉴定组织特异性 TE 衍生和 TE 调节基因的工具。RepExpress 识别与注释基因组特征重叠的表达 TE,并能够生成 TE 衍生基因的组织特异性图谱。没有与任何已知基因组特征重叠的表达 TE 通过最近的下游基因组特征进行特征描述,从而能够鉴定新的 TE-基因调控关系。RepExpress 以标准 RNA-seq 数据作为输入,并使用 featureCounts 和 Stringtie 分别对 TE 和基因的表达进行基因组比对优化。然后,RepExpress 过滤表达的重复序列并描述其基因组背景,从而能够识别与基因重叠或可能影响基因表达的 TE。在这里,我们描述了 RepExpress,并提供了详细的工作流程分步协议。我们还讨论了其他 TE 分析工具及其在解决不同生物学问题方面的适用性。© 2021 威立出版公司。基本方案:RepExpress 工作流程。