School of Medicine, Stony Brook University, Stony Brook, NY 11794, USA.
BMC Bioinformatics. 2013;14 Suppl 9(Suppl 9):S1. doi: 10.1186/1471-2105-14-S9-S1. Epub 2013 Jun 28.
High throughput parallel sequencing, RNA-Seq, has recently emerged as an appealing alternative to microarray in identifying differentially expressed genes (DEG) between biological groups. However, there still exists considerable discrepancy on gene expression measurements and DEG results between the two platforms. The objective of this study was to compare parallel paired-end RNA-Seq and microarray data generated on 5-azadeoxy-cytidine (5-Aza) treated HT-29 colon cancer cells with an additional simulation study.
We first performed general correlation analysis comparing gene expression profiles on both platforms. An Errors-In-Variables (EIV) regression model was subsequently applied to assess proportional and fixed biases between the two technologies. Then several existing algorithms, designed for DEG identification in RNA-Seq and microarray data, were applied to compare the cross-platform overlaps with respect to DEG lists, which were further validated using qRT-PCR assays on selected genes. Functional analyses were subsequently conducted using Ingenuity Pathway Analysis (IPA).
Pearson and Spearman correlation coefficients between the RNA-Seq and microarray data each exceeded 0.80, with 66%~68% overlap of genes on both platforms. The EIV regression model indicated the existence of both fixed and proportional biases between the two platforms. The DESeq and baySeq algorithms (RNA-Seq) and the SAM and eBayes algorithms (microarray) achieved the highest cross-platform overlap rate in DEG results from both experimental and simulated datasets. DESeq method exhibited a better control on the false discovery rate than baySeq on the simulated dataset although it performed slightly inferior to baySeq in the sensitivity test. RNA-Seq and qRT-PCR, but not microarray data, confirmed the expected reversal of SPARC gene suppression after treating HT-29 cells with 5-Aza. Thirty-three IPA canonical pathways were identified by both microarray and RNA-Seq data, 152 pathways by RNA-Seq data only, and none by microarray data only.
These results suggest that RNA-Seq has advantages over microarray in identification of DEGs with the most consistent results generated from DESeq and SAM methods. The EIV regression model reveals both fixed and proportional biases between RNA-Seq and microarray. This may explain in part the lower cross-platform overlap in DEG lists compared to those in detectable genes.
高通量平行测序(RNA-Seq)最近已成为在生物群体间识别差异表达基因(DEG)的一种有吸引力的替代微阵列的方法。然而,在这两种平台上的基因表达测量和 DEG 结果之间仍然存在相当大的差异。本研究的目的是比较 5-氮杂脱氧胞苷(5-Aza)处理的 HT-29 结肠癌细胞的平行配对末端 RNA-Seq 和微阵列数据,并进行额外的模拟研究。
我们首先进行了一般相关性分析,比较了两种平台上的基因表达谱。随后应用误差变量(EIV)回归模型来评估两种技术之间的比例和固定偏差。然后,应用几种现有的用于 RNA-Seq 和微阵列数据中 DEG 识别的算法,比较了跨平台 DEG 列表的重叠情况,并使用选定基因的 qRT-PCR 检测进行了进一步验证。随后使用 Ingenuity 通路分析(IPA)进行功能分析。
RNA-Seq 和微阵列数据之间的 Pearson 和 Spearman 相关系数均超过 0.80,两种平台上的基因重叠率为 66%~68%。EIV 回归模型表明两种平台之间存在固定和比例偏差。DESeq 和 baySeq 算法(RNA-Seq)和 SAM 和 eBayes 算法(微阵列)在实验和模拟数据集的 DEG 结果中实现了最高的跨平台重叠率。DESeq 方法在模拟数据集上比 baySeq 方法具有更好的假发现率控制,尽管在灵敏度测试中略逊于 baySeq 方法。RNA-Seq 和 qRT-PCR,但不是微阵列数据,证实了 HT-29 细胞用 5-Aza 处理后 SPARC 基因抑制的预期逆转。微阵列和 RNA-Seq 数据鉴定了 33 个 IPA 经典通路,RNA-Seq 数据仅鉴定了 152 个通路,微阵列数据未鉴定到通路。
这些结果表明,RNA-Seq 在识别 DEG 方面优于微阵列,最一致的结果来自于 DESeq 和 SAM 方法。EIV 回归模型揭示了 RNA-Seq 和微阵列之间的固定和比例偏差。这在一定程度上解释了与可检测基因相比,DEG 列表的跨平台重叠率较低的原因。