Department of Pathology and Laboratory Medicine, University of Kansas Medical Center, Kansas City, KS 66160, USA.
Cells. 2024 Sep 8;13(17):1502. doi: 10.3390/cells13171502.
RNA sequencing (RNA-Seq) has become a widely adopted technique for studying gene expression. However, conventional RNA-Seq analyses rely on gene expression (GE) values that aggregate all the transcripts produced under a single gene identifier, overlooking the complexity of transcript variants arising from different transcription start sites or alternative splicing. Transcript variants may encode proteins with diverse functional domains, or noncoding RNAs. This study explored the implications of neglecting transcript variants in RNA-Seq analyses. Among the 1334 transcription factor (TF) genes expressed in mouse embryonic stem (ES) or trophoblast stem (TS) cells, 652 were differentially expressed in TS cells based on GE values (365 upregulated and 287 downregulated, ≥absolute 2-fold changes, false discovery rate (FDR) -value ≤ 0.05). The 365 upregulated genes expressed 883 transcript variants. Further transcript expression (TE) based analyses identified only 174 (<20%) of the 883 transcripts to be upregulated. The remaining 709 transcripts were either downregulated or showed no significant changes. Meanwhile, the 287 downregulated genes expressed 856 transcript variants and only 153 (<20%) of the 856 transcripts were downregulated. The other 703 transcripts were either upregulated or showed no significant change. Additionally, the 682 insignificant TF genes (GE values < absolute 2-fold changes and/or FDR p-values > 0.05) between ES and TS cells expressed 2215 transcript variants. These included 477 (>21%) differentially expressed transcripts (276 upregulated and 201 downregulated, ≥absolute 2-fold changes, FDR p-value ≤ 0.05). Hence, GE based RNA-Seq analyses do not represent accurate expression levels due to divergent transcripts expression from the same gene. Our findings show that by including transcript variants in RNA-Seq analyses, we can generate a precise understanding of a gene's functional and regulatory landscape; ignoring the variants may result in an erroneous interpretation.
RNA 测序(RNA-Seq)已成为研究基因表达的广泛采用的技术。然而,传统的 RNA-Seq 分析依赖于基因表达(GE)值,这些值聚合了单个基因标识符下产生的所有转录本,而忽略了来自不同转录起始位点或选择性剪接的转录本变体的复杂性。转录本变体可能编码具有不同功能结构域的蛋白质或非编码 RNA。本研究探讨了在 RNA-Seq 分析中忽略转录本变体的影响。在表达于小鼠胚胎干细胞(ES)或滋养层干细胞(TS)中的 1334 个转录因子(TF)基因中,根据 GE 值,有 652 个在 TS 细胞中差异表达(365 个上调,287 个下调,≥绝对 2 倍变化,错误发现率(FDR)值≤0.05)。365 个上调基因表达 883 个转录本变体。进一步基于转录表达(TE)的分析仅鉴定出 883 个转录本中的 174 个(<20%)上调。其余 709 个转录本要么下调,要么没有明显变化。同时,287 个下调基因表达 856 个转录本变体,而 856 个转录本中的 153 个(<20%)下调。其余 703 个转录本要么上调,要么没有明显变化。此外,在 ES 和 TS 细胞之间 GE 值<绝对 2 倍变化和/或 FDR p 值>0.05 的 682 个无显著 TF 基因表达 2215 个转录本变体。其中包括 477 个(>21%)差异表达转录本(276 个上调,201 个下调,≥绝对 2 倍变化,FDR p 值≤0.05)。因此,由于来自同一基因的转录本表达的差异,基于 GE 的 RNA-Seq 分析并不能代表准确的表达水平。我们的研究结果表明,通过在 RNA-Seq 分析中包含转录本变体,可以更准确地了解基因的功能和调控景观;而忽略变体可能会导致错误的解释。