Novartis Healthcare Private Limited, Hyderabad, India.
Genomics. 2019 Jul;111(4):893-898. doi: 10.1016/j.ygeno.2018.05.018. Epub 2018 May 26.
RNA-Seq technology has revolutionized the face of gene expression profiling by generating read count data measuring the transcript abundances for each queried gene on multiple experimental subjects. But on the downside, the underlying technical artefacts and hidden biological profiles of the samples generate a wide variety of latent effects that may potentially distort the actual transcript/gene expression signals. Standard normalization techniques fail to correct for these hidden variables and lead to flawed downstream analyses. In this work I demonstrate the use of Partial Least Squares (built as an R package 'SVAPLSseq') to correct for the traces of extraneous variability in RNA-Seq data. A novel and thorough comparative analysis of the PLS based method is presented along with some of the other popularly used approaches for latent variable correction in RNA-Seq. Overall, the method is found to achieve a substantially improved estimation of the hidden effect signatures in the RNA-Seq transcriptome expression landscape compared to other available techniques.
RNA-Seq 技术通过生成读取计数数据,测量多个实验对象中每个查询基因的转录物丰度,从而彻底改变了基因表达谱分析的面貌。但不利的一面是,样本的潜在技术伪影和隐藏生物学特征产生了各种各样的潜在影响,可能会潜在地扭曲实际的转录物/基因表达信号。标准的归一化技术无法纠正这些隐藏变量,从而导致下游分析存在缺陷。在这项工作中,我展示了使用偏最小二乘法(作为 R 包 'SVAPLSseq' 构建)来纠正 RNA-Seq 数据中多余可变性的痕迹。提出了一种新颖而全面的基于 PLS 的方法的比较分析,以及其他一些在 RNA-Seq 中用于潜在变量校正的常用方法。总的来说,与其他可用技术相比,该方法在估计 RNA-Seq 转录组表达图谱中的隐藏效应特征方面取得了显著提高。