Word Laura J, Willis Clinton M, Judson Richard S, Everett Logan J, Davidson-Fritz Sarah E, Haggard Derik E, Chambers Bryant A, Rogers Jesse D, Bundy Joseph L, Shah Imran, Sipes Nisha S, Harrill Joshua A
Center for Computational Toxicology and Exposure, Office of Research and Development, United States Environmental Protection Agency, Research Triangle Park, North Carolina, United States of America.
PLoS One. 2025 May 9;20(5):e0320862. doi: 10.1371/journal.pone.0320862. eCollection 2025.
Recent advances in transcriptomics technologies allow for whole transcriptome gene expression profiling using targeted sequencing techniques, which is becoming increasingly popular due to logistical ease of data acquisition and analysis. As data from these targeted sequencing platforms accumulates, it is important to evaluate their similarity to traditional whole transcriptome RNA-seq. Thus, we evaluated the comparability of TempO-seq data from cell lysates to traditional RNA-Seq from purified RNA using baseline gene expression profiles. First, two TempO-seq data sets that were generated several months apart at different read depths were compared for six human cell lines. The average Pearson correlation was 0.93 (95% CI: 0.90-0.96) and principal component analysis (PCA) showed that these two TempO-seq data sets were highly reproducible and could be combined. Next, TempO-seq data was compared to RNA-Seq data for 39 human cell lines. The log2 normalized expression data for 19,290 genes within both platforms were well correlated between TempO-seq and RNA-seq (Pearson correlation 0.77, 95% CI: 0.76-0.78), and the majority of genes (15,480 genes, 80%) had concordant gene expression levels. PCA showed a platform divergence, but this was readily resolved by calculating relative log2 expression (RLE) of genes compared to the average expression across cell lines in each platform. Application of gene ontology analysis revealed that ontologies associated with histone and ribosomal functions were enriched for the 20% of genes with non-concordant expression levels (3,810 genes). On the other hand, gene ontologies annotated to cellular structure functions were enriched for genes with concordant expression levels between the platforms. In conclusion, we found TempO-seq baseline expression data to be reproducible at different read depths and found TempO-seq RLE data from lysed cells to be comparable to RNA-seq RLE data from purified RNA across 39 cell lines, even though the datasets were generated by different laboratories using different cell stocks.
转录组学技术的最新进展使得使用靶向测序技术进行全转录组基因表达谱分析成为可能,由于数据采集和分析在后勤方面的简便性,这种方法越来越受欢迎。随着来自这些靶向测序平台的数据不断积累,评估它们与传统全转录组RNA测序的相似性变得很重要。因此,我们使用基线基因表达谱评估了来自细胞裂解物的TempO-seq数据与来自纯化RNA的传统RNA测序的可比性。首先,对六个人类细胞系在不同读取深度下相隔数月生成的两个TempO-seq数据集进行了比较。平均皮尔逊相关系数为0.93(95%置信区间:0.90 - 0.96),主成分分析(PCA)表明这两个TempO-seq数据集具有高度可重复性,可以合并。接下来,将TempO-seq数据与39个人类细胞系的RNA测序数据进行了比较。两个平台中19290个基因的log2标准化表达数据在TempO-seq和RNA测序之间具有良好的相关性(皮尔逊相关系数0.77,95%置信区间:0.76 - 0.78),并且大多数基因(15480个基因,80%)具有一致的基因表达水平。PCA显示了平台差异,但通过计算与每个平台中细胞系平均表达相比的基因相对log2表达(RLE),这种差异很容易得到解决。基因本体分析的应用表明,与组蛋白和核糖体功能相关的本体在20%表达水平不一致的基因(3810个基因)中富集。另一方面,注释到细胞结构功能的基因本体在平台之间表达水平一致的基因中富集。总之,我们发现TempO-seq基线表达数据在不同读取深度下具有可重复性,并且发现来自裂解细胞的TempO-seq RLE数据与来自39个细胞系的纯化RNA的RNA测序RLE数据具有可比性,尽管这些数据集是由不同实验室使用不同细胞株生成的。