针对 RNA-seq 差异表达分析，一种稳健的针对个体异质性的经验似然比检验。

An empirical likelihood ratio test robust to individual heterogeneity for differential expression analysis of RNA-seq.

出版信息

Brief Bioinform. 2018 Jan 1;19(1):109-117. doi: 10.1093/bib/bbw103.

DOI:10.1093/bib/bbw103

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5875907/

Abstract

The individual sample heterogeneity is one of the biggest obstacles in biomarker identification for complex diseases such as cancers. Current statistical models to identify differentially expressed genes between disease and control groups often overlook the substantial human sample heterogeneity. Meanwhile, traditional nonparametric tests lose detailed data information and sacrifice the analysis power, although they are distribution free and robust to heterogeneity. Here, we propose an empirical likelihood ratio test with a mean-variance relationship constraint (ELTSeq) for the differential expression analysis of RNA sequencing (RNA-seq). As a distribution-free nonparametric model, ELTSeq handles individual heterogeneity by estimating an empirical probability for each observation without making any assumption about read-count distribution. It also incorporates a constraint for the read-count overdispersion, which is widely observed in RNA-seq data. ELTSeq demonstrates a significant improvement over existing methods such as edgeR, DESeq, t-tests, Wilcoxon tests and the classic empirical likelihood-ratio test when handling heterogeneous groups. It will significantly advance the transcriptomics studies of cancers and other complex disease.

摘要

个体样本异质性是癌症等复杂疾病生物标志物识别的最大障碍之一。目前用于识别疾病和对照组之间差异表达基因的统计模型往往忽略了大量的人类样本异质性。同时，传统的非参数检验虽然对异质性具有鲁棒性且无需分布假设，但会丢失详细的数据信息并牺牲分析能力。在这里，我们针对 RNA 测序（RNA-seq）提出了一种带有均值-方差关系约束的经验似然比检验（ELTSeq），用于差异表达分析。作为一种无分布的非参数模型，ELTSeq 通过对每个观测值进行经验概率估计来处理个体异质性，而无需对读取计数分布做出任何假设。它还包含了对 RNA-seq 数据中广泛观察到的读取计数过分散的约束。当处理异质组时，ELTSeq 相较于 edgeR、DESeq、t 检验、Wilcoxon 检验和经典的经验似然比检验等现有方法有显著的改进。它将极大地推进癌症和其他复杂疾病的转录组学研究。

相似文献

1

An empirical likelihood ratio test robust to individual heterogeneity for differential expression analysis of RNA-seq.

Brief Bioinform. 2018 Jan 1;19(1):109-117. doi: 10.1093/bib/bbw103.

2

Power analysis and sample size estimation for RNA-Seq differential expression.

RNA. 2014 Nov;20(11):1684-96. doi: 10.1261/rna.046011.114. Epub 2014 Sep 22.

3

A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data.

PLoS One. 2017 May 1;12(5):e0176185. doi: 10.1371/journal.pone.0176185. eCollection 2017.

4

Modeling overdispersion heterogeneity in differential expression analysis using mixtures.

Biometrics. 2016 Sep;72(3):804-14. doi: 10.1111/biom.12458. Epub 2015 Dec 18.

5

Comparison of normalization and differential expression analyses using RNA-Seq data from 726 individual Drosophila melanogaster.

BMC Genomics. 2016 Jan 5;17:28. doi: 10.1186/s12864-015-2353-z.

6

Exon-level estimates improve the detection of differentially expressed genes in RNA-seq studies.

RNA Biol. 2021 Nov;18(11):1739-1746. doi: 10.1080/15476286.2020.1868151. Epub 2021 Jan 30.

7

Differentially expressed heterogeneous overdispersion genes testing for count data.

PLoS One. 2024 Jul 17;19(7):e0300565. doi: 10.1371/journal.pone.0300565. eCollection 2024.

8

Sample size calculations for the differential expression analysis of RNA-seq data using a negative binomial regression model.

Stat Appl Genet Mol Biol. 2019 Jan 22;18(1):/j/sagmb.2019.18.issue-1/sagmb-2018-0021/sagmb-2018-0021.xml. doi: 10.1515/sagmb-2018-0021.

9

PLNseq: a multivariate Poisson lognormal distribution for high-throughput matched RNA-sequencing read count data.

Stat Med. 2015 Apr 30;34(9):1577-89. doi: 10.1002/sim.6449. Epub 2015 Jan 30.

10

NPEBseq: nonparametric empirical bayesian-based procedure for differential expression analysis of RNA-seq data.

BMC Bioinformatics. 2013 Aug 27;14:262. doi: 10.1186/1471-2105-14-262.

引用本文的文献

1

Transcriptomic and Epigenetic Preservation of Genetic Sex Identity in Estrogen-feminized Male Chicken Embryonic Gonads.

Endocrinology. 2021 Jan 1;162(1). doi: 10.1210/endocr/bqaa208.

2

Quantile regression for challenging cases of eQTL mapping.

Brief Bioinform. 2020 Sep 25;21(5):1756-1765. doi: 10.1093/bib/bbz097.

本文引用的文献

1

Multiplatform analysis of 12 cancer types reveals molecular classification within and across tissues of origin.

Cell. 2014 Aug 14;158(4):929-944. doi: 10.1016/j.cell.2014.06.049. Epub 2014 Aug 7.

2

Intratumoral heterogeneity in kidney cancer.

Nat Genet. 2014 Mar;46(3):214-5. doi: 10.1038/ng.2904.

3

Comprehensive molecular characterization of urothelial bladder carcinoma.

Nature. 2014 Mar 20;507(7492):315-22. doi: 10.1038/nature12965. Epub 2014 Jan 29.

4

Comprehensive molecular characterization of clear cell renal cell carcinoma.

Nature. 2013 Jul 4;499(7456):43-9. doi: 10.1038/nature12222. Epub 2013 Jun 23.

5

Integrated genomic characterization of endometrial carcinoma.

Nature. 2013 May 2;497(7447):67-73. doi: 10.1038/nature12113.

6

Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia.

N Engl J Med. 2013 May 30;368(22):2059-74. doi: 10.1056/NEJMoa1301689. Epub 2013 May 1.

7

A comparison of methods for differential expression analysis of RNA-seq data.

BMC Bioinformatics. 2013 Mar 9;14:91. doi: 10.1186/1471-2105-14-91.

8

A new shrinkage estimator for dispersion improves differential expression detection in RNA-seq data.

Biostatistics. 2013 Apr;14(2):232-43. doi: 10.1093/biostatistics/kxs033. Epub 2012 Sep 22.

9

Efficient experimental design and analysis strategies for the detection of differential expression using RNA-Sequencing.

BMC Genomics. 2012 Sep 17;13:484. doi: 10.1186/1471-2164-13-484.

10

The transcriptional landscape and mutational profile of lung adenocarcinoma.

Genome Res. 2012 Nov;22(11):2109-19. doi: 10.1101/gr.145144.112. Epub 2012 Sep 13.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

文档翻译

学术文献翻译模型，支持多种主流文档格式。