发现一致模式：一种用于鉴定 RNA-Seq 数据中差异表达的非参数方法。

Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data.

机构信息

1Department of Statistics, Stanford University, Stanford, CA 94305, USA.

出版信息

Stat Methods Med Res. 2013 Oct;22(5):519-36. doi: 10.1177/0962280211428386. Epub 2011 Nov 28.

DOI:10.1177/0962280211428386

PMID:22127579

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4605138/

Abstract

We discuss the identification of features that are associated with an outcome in RNA-Sequencing (RNA-Seq) and other sequencing-based comparative genomic experiments. RNA-Seq data takes the form of counts, so models based on the normal distribution are generally unsuitable. The problem is especially challenging because different sequencing experiments may generate quite different total numbers of reads, or 'sequencing depths'. Existing methods for this problem are based on Poisson or negative binomial models: they are useful but can be heavily influenced by 'outliers' in the data. We introduce a simple, non-parametric method with resampling to account for the different sequencing depths. The new method is more robust than parametric methods. It can be applied to data with quantitative, survival, two-class or multiple-class outcomes. We compare our proposed method to Poisson and negative binomial-based methods in simulated and real data sets, and find that our method discovers more consistent patterns than competing methods.

摘要

我们讨论了在 RNA 测序（RNA-Seq）和其他基于测序的比较基因组实验中识别与结果相关的特征。RNA-Seq 数据采用计数的形式，因此基于正态分布的模型通常不适用。这个问题特别具有挑战性，因为不同的测序实验可能产生非常不同的总读取数，或“测序深度”。现有的此类问题的方法基于泊松或负二项式模型：它们很有用，但可能会受到数据中的“异常值”的严重影响。我们引入了一种简单的、基于重采样的非参数方法来考虑不同的测序深度。新方法比参数方法更稳健。它可以应用于具有定量、生存、两分类或多分类结果的数据。我们在模拟数据集和真实数据集上比较了我们提出的方法和泊松和负二项式方法，发现我们的方法比竞争方法发现了更一致的模式。

相似文献

Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data.

Stat Methods Med Res. 2013 Oct;22(5):519-36. doi: 10.1177/0962280211428386. Epub 2011 Nov 28.

Normalization, testing, and false discovery rate estimation for RNA-sequencing data.

Biostatistics. 2012 Jul;13(3):523-38. doi: 10.1093/biostatistics/kxr031. Epub 2011 Oct 14.

SimSeq: a nonparametric approach to simulation of RNA-sequence datasets.

Bioinformatics. 2015 Jul 1;31(13):2131-40. doi: 10.1093/bioinformatics/btv124. Epub 2015 Feb 26.

LFCseq: a nonparametric approach for differential expression analysis of RNA-seq data.

BMC Genomics. 2014;15 Suppl 10(Suppl 10):S7. doi: 10.1186/1471-2164-15-S10-S7. Epub 2014 Dec 12.

Differential expression analysis of RNA sequencing data by incorporating non-exonic mapped reads.

BMC Genomics. 2015;16 Suppl 7(Suppl 7):S14. doi: 10.1186/1471-2164-16-S7-S14. Epub 2015 Jun 11.

Modifying SAMseq to account for asymmetry in the distribution of effect sizes when identifying differentially expressed genes.

Stat Appl Genet Mol Biol. 2017 Nov 27;16(5-6):291-312. doi: 10.1515/sagmb-2016-0037.

NPEBseq: nonparametric empirical bayesian-based procedure for differential expression analysis of RNA-seq data.

BMC Bioinformatics. 2013 Aug 27;14:262. doi: 10.1186/1471-2105-14-262.

Robust identification of differentially expressed genes from RNA-seq data.

Genomics. 2020 Mar;112(2):2000-2010. doi: 10.1016/j.ygeno.2019.11.012. Epub 2019 Nov 20.

NBLDA: negative binomial linear discriminant analysis for RNA-Seq data.

BMC Bioinformatics. 2016 Sep 13;17(1):369. doi: 10.1186/s12859-016-1208-1.

A flexible count data model to fit the wide diversity of expression profiles arising from extensively replicated RNA-seq experiments.

BMC Bioinformatics. 2013 Aug 21;14:254. doi: 10.1186/1471-2105-14-254.

引用本文的文献

Transcriptional dynamics during karyogamy in rice zygotes.

Development. 2025 Jan 15;152(2). doi: 10.1242/dev.204497. Epub 2025 Jan 27.

Investigation the role of SIRT3, SIRT7, NFATC1, and PDL-1 genes in androgenetic alopecia.

BMC Res Notes. 2024 Nov 21;17(1):343. doi: 10.1186/s13104-024-06980-9.

PTMoreR-enabled cross-species PTM mapping and comparative phosphoproteomics across mammals.

Cell Rep Methods. 2024 Sep 16;4(9):100859. doi: 10.1016/j.crmeth.2024.100859. Epub 2024 Sep 9.

A comprehensive workflow for optimizing RNA-seq data analysis.

BMC Genomics. 2024 Jun 24;25(1):631. doi: 10.1186/s12864-024-10414-y.

Comparative study on differential expression analysis methods for single-cell RNA sequencing data with small biological replicates: Based on single-cell transcriptional data of PBMCs from COVID-19 severe patients.

PLoS One. 2024 Mar 27;19(3):e0299358. doi: 10.1371/journal.pone.0299358. eCollection 2024.

Differential gene expression analysis pipelines and bioinformatic tools for the identification of specific biomarkers: A review.

Comput Struct Biotechnol J. 2024 Mar 1;23:1154-1168. doi: 10.1016/j.csbj.2024.02.018. eCollection 2024 Dec.

Detecting differential transcript usage in complex diseases with SPIT.

Cell Rep Methods. 2024 Mar 25;4(3):100736. doi: 10.1016/j.crmeth.2024.100736. Epub 2024 Mar 19.

Historical perspective and future directions: computational science in immuno-oncology.

J Immunother Cancer. 2024 Jan 8;12(1):e008306. doi: 10.1136/jitc-2023-008306.

Construction of Immune Infiltration-Related LncRNA Signatures Based on Machine Learning for the Prognosis in Colon Cancer.

Biochem Genet. 2024 Jun;62(3):1925-1952. doi: 10.1007/s10528-023-10516-4. Epub 2023 Oct 4.

Detecting differential transcript usage in complex diseases with SPIT.

bioRxiv. 2023 Jul 10:2023.07.10.548289. doi: 10.1101/2023.07.10.548289.

本文引用的文献

Normalization, testing, and false discovery rate estimation for RNA-sequencing data.

Biostatistics. 2012 Jul;13(3):523-38. doi: 10.1093/biostatistics/kxr031. Epub 2011 Oct 14.

Differential expression analysis for sequence count data.

Genome Biol. 2010;11(10):R106. doi: 10.1186/gb-2010-11-10-r106. Epub 2010 Oct 27.

baySeq: empirical Bayesian methods for identifying differential expression in sequence count data.

BMC Bioinformatics. 2010 Aug 10;11:422. doi: 10.1186/1471-2105-11-422.

Ultra-high throughput sequencing-based small RNA discovery and discrete statistical biomarker analysis in a collection of cervical tumours and matched controls.

BMC Biol. 2010 May 11;8:58. doi: 10.1186/1741-7007-8-58.

A scaling normalization method for differential expression analysis of RNA-seq data.

Genome Biol. 2010;11(3):R25. doi: 10.1186/gb-2010-11-3-r25. Epub 2010 Mar 2.

Evaluation of statistical methods for normalization and differential expression in mRNA-Seq experiments.

BMC Bioinformatics. 2010 Feb 18;11:94. doi: 10.1186/1471-2105-11-94.

edgeR: a Bioconductor package for differential expression analysis of digital gene expression data.

Bioinformatics. 2010 Jan 1;26(1):139-40. doi: 10.1093/bioinformatics/btp616. Epub 2009 Nov 11.

DEGseq: an R package for identifying differentially expressed genes from RNA-seq data.

Bioinformatics. 2010 Jan 1;26(1):136-8. doi: 10.1093/bioinformatics/btp612. Epub 2009 Oct 24.

Measuring differential gene expression by short read sequencing: quantitative comparison to 2-channel gene expression microarrays.

BMC Genomics. 2009 May 12;10:221. doi: 10.1186/1471-2164-10-221.

RNA-Seq-quantitative measurement of expression through massively parallel RNA-sequencing.

Methods. 2009 Jul;48(3):249-57. doi: 10.1016/j.ymeth.2009.03.016. Epub 2009 Mar 29.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

发现一致模式：一种用于鉴定 RNA-Seq 数据中差异表达的非参数方法。

Finding consistent patterns: a nonparametric approach for identifying differential expression in RNA-Seq data.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献