使用偏最小二乘法（PLS）可提高基于 RNA-Seq 数据的差异表达分析中去除不必要变异性的效果。

Use of Partial Least Squares improves the efficacy of removing unwanted variability in differential expression analyses based on RNA-Seq data.

机构信息

Novartis Healthcare Private Limited, Hyderabad, India.

出版信息

Genomics. 2019 Jul;111(4):893-898. doi: 10.1016/j.ygeno.2018.05.018. Epub 2018 May 26.

DOI:10.1016/j.ygeno.2018.05.018

Abstract

RNA-Seq technology has revolutionized the face of gene expression profiling by generating read count data measuring the transcript abundances for each queried gene on multiple experimental subjects. But on the downside, the underlying technical artefacts and hidden biological profiles of the samples generate a wide variety of latent effects that may potentially distort the actual transcript/gene expression signals. Standard normalization techniques fail to correct for these hidden variables and lead to flawed downstream analyses. In this work I demonstrate the use of Partial Least Squares (built as an R package 'SVAPLSseq') to correct for the traces of extraneous variability in RNA-Seq data. A novel and thorough comparative analysis of the PLS based method is presented along with some of the other popularly used approaches for latent variable correction in RNA-Seq. Overall, the method is found to achieve a substantially improved estimation of the hidden effect signatures in the RNA-Seq transcriptome expression landscape compared to other available techniques.

摘要

RNA-Seq 技术通过生成读取计数数据，测量多个实验对象中每个查询基因的转录物丰度，从而彻底改变了基因表达谱分析的面貌。但不利的一面是，样本的潜在技术伪影和隐藏生物学特征产生了各种各样的潜在影响，可能会潜在地扭曲实际的转录物/基因表达信号。标准的归一化技术无法纠正这些隐藏变量，从而导致下游分析存在缺陷。在这项工作中，我展示了使用偏最小二乘法（作为 R 包 'SVAPLSseq' 构建）来纠正 RNA-Seq 数据中多余可变性的痕迹。提出了一种新颖而全面的基于 PLS 的方法的比较分析，以及其他一些在 RNA-Seq 中用于潜在变量校正的常用方法。总的来说，与其他可用技术相比，该方法在估计 RNA-Seq 转录组表达图谱中的隐藏效应特征方面取得了显著提高。

相似文献

Use of Partial Least Squares improves the efficacy of removing unwanted variability in differential expression analyses based on RNA-Seq data.

Genomics. 2019 Jul;111(4):893-898. doi: 10.1016/j.ygeno.2018.05.018. Epub 2018 May 26.

Surrogate variable analysis using partial least squares (SVA-PLS) in gene expression studies.

Bioinformatics. 2012 Mar 15;28(6):799-806. doi: 10.1093/bioinformatics/bts022. Epub 2012 Jan 11.

How data analysis affects power, reproducibility and biological insight of RNA-seq studies in complex datasets.

Nucleic Acids Res. 2015 Sep 18;43(16):7664-74. doi: 10.1093/nar/gkv736. Epub 2015 Jul 21.

A differential k-mer analysis pipeline for comparing RNA-Seq transcriptome and meta-transcriptome datasets without a reference.

Funct Integr Genomics. 2019 Mar;19(2):363-371. doi: 10.1007/s10142-018-0647-3. Epub 2018 Nov 27.

Empirical assessment of analysis workflows for differential expression analysis of human samples using RNA-Seq.

BMC Bioinformatics. 2017 Jan 17;18(1):38. doi: 10.1186/s12859-016-1457-z.

svapls: an R package to correct for hidden factors of variability in gene expression studies.

BMC Bioinformatics. 2013 Jul 24;14:236. doi: 10.1186/1471-2105-14-236.

SPARTA: Simple Program for Automated reference-based bacterial RNA-seq Transcriptome Analysis.

BMC Bioinformatics. 2016 Feb 4;17:66. doi: 10.1186/s12859-016-0923-y.

Comparison of stranded and non-stranded RNA-seq transcriptome profiling and investigation of gene overlap.

BMC Genomics. 2015 Sep 3;16(1):675. doi: 10.1186/s12864-015-1876-7.

Improving RNA-Seq expression estimation by modeling isoform- and exon-specific read sequencing rate.

BMC Bioinformatics. 2015 Oct 16;16:332. doi: 10.1186/s12859-015-0750-6.

Systematic comparison of RNA-Seq normalization methods using measurement error models.

Bioinformatics. 2012 Oct 15;28(20):2584-91. doi: 10.1093/bioinformatics/bts497. Epub 2012 Aug 22.

引用本文的文献

Exploratory analysis of differences at the transcriptional interface between the maternal and fetal compartments of the sheep placenta and potential influence of fetal sex.

Mol Cell Endocrinol. 2025 Jun 1;603:112546. doi: 10.1016/j.mce.2025.112546. Epub 2025 Apr 12.

Sufficient principal component regression for pattern discovery in transcriptomic data.

Bioinform Adv. 2022 May 14;2(1):vbac033. doi: 10.1093/bioadv/vbac033. eCollection 2022.

Roles of Physicochemical and Structural Properties of RNA-Binding Proteins in Predicting the Activities of Trans-Acting Splicing Factors with Machine Learning.

Int J Mol Sci. 2022 Apr 17;23(8):4426. doi: 10.3390/ijms23084426.

A Study on microRNAs Targeting the Genes Overexpressed in Lung Cancer and their Codon Usage Patterns.

Mol Biotechnol. 2022 Oct;64(10):1095-1119. doi: 10.1007/s12033-022-00491-3. Epub 2022 Apr 18.

Combination of two analytical techniques improves wine classification by Vineyard, Region, and vintage.

Food Chem. 2021 Aug 30;354:129531. doi: 10.1016/j.foodchem.2021.129531. Epub 2021 Mar 10.

Processing and Analysis of RNA-seq Data from Public Resources.

Methods Mol Biol. 2021;2243:81-94. doi: 10.1007/978-1-0716-1103-6_4.

Batch correction evaluation framework using a-priori gene-gene associations: applied to the GTEx dataset.

BMC Bioinformatics. 2019 May 28;20(1):268. doi: 10.1186/s12859-019-2855-9.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用偏最小二乘法（PLS）可提高基于 RNA-Seq 数据的差异表达分析中去除不必要变异性的效果。

Use of Partial Least Squares improves the efficacy of removing unwanted variability in differential expression analyses based on RNA-Seq data.

机构信息

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献