Suppr超能文献

SimSeq:一种用于RNA序列数据集模拟的非参数方法。

SimSeq: a nonparametric approach to simulation of RNA-sequence datasets.

作者信息

Benidt Sam, Nettleton Dan

机构信息

Department of Statistics, Iowa State University, Ames, IA 50011-1210, USA.

出版信息

Bioinformatics. 2015 Jul 1;31(13):2131-40. doi: 10.1093/bioinformatics/btv124. Epub 2015 Feb 26.

Abstract

MOTIVATION

RNA sequencing analysis methods are often derived by relying on hypothetical parametric models for read counts that are not likely to be precisely satisfied in practice. Methods are often tested by analyzing data that have been simulated according to the assumed model. This testing strategy can result in an overly optimistic view of the performance of an RNA-seq analysis method.

RESULTS

We develop a data-based simulation algorithm for RNA-seq data. The vector of read counts simulated for a given experimental unit has a joint distribution that closely matches the distribution of a source RNA-seq dataset provided by the user. We conduct simulation experiments based on the negative binomial distribution and our proposed nonparametric simulation algorithm. We compare performance between the two simulation experiments over a small subset of statistical methods for RNA-seq analysis available in the literature. We use as a benchmark the ability of a method to control the false discovery rate. Not surprisingly, methods based on parametric modeling assumptions seem to perform better with respect to false discovery rate control when data are simulated from parametric models rather than using our more realistic nonparametric simulation strategy.

AVAILABILITY AND IMPLEMENTATION

The nonparametric simulation algorithm developed in this article is implemented in the R package SimSeq, which is freely available under the GNU General Public License (version 2 or later) from the Comprehensive R Archive Network (http://cran.rproject.org/).

CONTACT

sgbenidt@gmail.com

SUPPLEMENTARY INFORMATION

Supplementary data are available at Bioinformatics online.

摘要

动机

RNA测序分析方法通常依赖于对读取计数的假设参数模型推导而来,而这些模型在实际中不太可能精确满足。方法通常通过分析根据假设模型模拟的数据进行测试。这种测试策略可能会导致对RNA测序分析方法性能的过度乐观看法。

结果

我们开发了一种基于数据的RNA测序数据模拟算法。为给定实验单元模拟的读取计数向量具有与用户提供的源RNA测序数据集分布紧密匹配的联合分布。我们基于负二项分布和我们提出的非参数模拟算法进行模拟实验。我们在文献中可用的一小部分RNA测序分析统计方法上比较了两种模拟实验的性能。我们将一种方法控制错误发现率的能力用作基准。不出所料,当从参数模型模拟数据而不是使用我们更现实的非参数模拟策略时,基于参数建模假设的方法在错误发现率控制方面似乎表现更好。

可用性和实现

本文中开发的非参数模拟算法在R包SimSeq中实现,该包可从综合R存档网络(http://cran.rproject.org/)根据GNU通用公共许可证(第2版或更高版本)免费获得。

联系方式

sgbenidt@gmail.com

补充信息

补充数据可在《生物信息学》在线获取。

相似文献

1
SimSeq: a nonparametric approach to simulation of RNA-sequence datasets.
Bioinformatics. 2015 Jul 1;31(13):2131-40. doi: 10.1093/bioinformatics/btv124. Epub 2015 Feb 26.
2
Polyester: simulating RNA-seq datasets with differential transcript expression.
Bioinformatics. 2015 Sep 1;31(17):2778-84. doi: 10.1093/bioinformatics/btv272. Epub 2015 Apr 28.
3
rSeqNP: a non-parametric approach for detecting differential expression and splicing from RNA-Seq data.
Bioinformatics. 2015 Jul 1;31(13):2222-4. doi: 10.1093/bioinformatics/btv119. Epub 2015 Feb 24.
5
Differential correlation for sequencing data.
BMC Res Notes. 2017 Jan 19;10(1):54. doi: 10.1186/s13104-016-2331-9.
6
Joint estimation of isoform expression and isoform-specific read distribution using multisample RNA-Seq data.
Bioinformatics. 2014 Feb 15;30(4):506-13. doi: 10.1093/bioinformatics/btt704. Epub 2013 Dec 3.
7
PROPER: comprehensive power evaluation for differential expression using RNA-seq.
Bioinformatics. 2015 Jan 15;31(2):233-41. doi: 10.1093/bioinformatics/btu640. Epub 2014 Oct 1.
8
A comparison of per sample global scaling and per gene normalization methods for differential expression analysis of RNA-seq data.
PLoS One. 2017 May 1;12(5):e0176185. doi: 10.1371/journal.pone.0176185. eCollection 2017.
9
smallWig: parallel compression of RNA-seq WIG files.
Bioinformatics. 2016 Jan 15;32(2):173-80. doi: 10.1093/bioinformatics/btv561. Epub 2015 Sep 30.
10
Guidance for RNA-seq co-expression network construction and analysis: safety in numbers.
Bioinformatics. 2015 Jul 1;31(13):2123-30. doi: 10.1093/bioinformatics/btv118. Epub 2015 Feb 24.

引用本文的文献

1
Crafted experiments to evaluate feature selection methods for single-cell RNA-seq data.
NAR Genom Bioinform. 2025 Mar 19;7(1):lqaf023. doi: 10.1093/nargab/lqaf023. eCollection 2025 Mar.
2
A comprehensive review and benchmark of differential analysis tools for Hi-C data.
Brief Bioinform. 2025 Mar 4;26(2). doi: 10.1093/bib/bbaf074.
3
Accurate assembly of full-length consensus for viral quasispecies.
BMC Bioinformatics. 2025 Feb 1;26(1):36. doi: 10.1186/s12859-025-06045-z.
4
BEERS2: RNA-Seq simulation through high fidelity in silico modeling.
Brief Bioinform. 2024 Mar 27;25(3). doi: 10.1093/bib/bbae164.
5
RNC: Uncovering the dynamic and condition-specific RBP-ncRNA circuits from multi-omics data.
Comput Struct Biotechnol J. 2023 Mar 23;21:2276-2285. doi: 10.1016/j.csbj.2023.03.035. eCollection 2023.
7
A Framework for Comparison and Assessment of Synthetic RNA-Seq Data.
Genes (Basel). 2022 Dec 14;13(12):2362. doi: 10.3390/genes13122362.
8
Haplotype assignment of longitudinal viral deep sequencing data using covariation of variant frequencies.
Virus Evol. 2022 Oct 6;8(2):veac093. doi: 10.1093/ve/veac093. eCollection 2022.
9
Spatially-enhanced clusterwise inference for testing and localizing intermodal correspondence.
Neuroimage. 2022 Dec 1;264:119712. doi: 10.1016/j.neuroimage.2022.119712. Epub 2022 Oct 26.
10
Impact of adaptive filtering on power and false discovery rate in RNA-seq experiments.
BMC Bioinformatics. 2022 Sep 24;23(1):388. doi: 10.1186/s12859-022-04928-z.

本文引用的文献

1
Moderated estimation of fold change and dispersion for RNA-seq data with DESeq2.
Genome Biol. 2014;15(12):550. doi: 10.1186/s13059-014-0550-8.
2
Error estimates for the analysis of differential expression from RNA-seq count data.
PeerJ. 2014 Sep 23;2:e576. doi: 10.7717/peerj.576. eCollection 2014.
3
subSeq: determining appropriate sequencing depth through efficient read subsampling.
Bioinformatics. 2014 Dec 1;30(23):3424-6. doi: 10.1093/bioinformatics/btu552. Epub 2014 Sep 3.
4
voom: Precision weights unlock linear model analysis tools for RNA-seq read counts.
Genome Biol. 2014 Feb 3;15(2):R29. doi: 10.1186/gb-2014-15-2-r29.
5
Evaluating statistical analysis models for RNA sequencing experiments.
Front Genet. 2013 Sep 17;4:178. doi: 10.3389/fgene.2013.00178. eCollection 2013.
6
Comprehensive molecular characterization of clear cell renal cell carcinoma.
Nature. 2013 Jul 4;499(7456):43-9. doi: 10.1038/nature12222. Epub 2013 Jun 23.
7
A comparison of methods for differential expression analysis of RNA-seq data.
BMC Bioinformatics. 2013 Mar 9;14:91. doi: 10.1186/1471-2105-14-91.
8
Detecting differential expression in RNA-sequence data using quasi-likelihood with shrunken dispersion estimates.
Stat Appl Genet Mol Biol. 2012 Oct 22;11(5):/j/sagmb.2012.11.issue-5/1544-6115.1826/1544-6115.1826.xml. doi: 10.1515/1544-6115.1826.
9
A comprehensive evaluation of normalization methods for Illumina high-throughput RNA sequencing data analysis.
Brief Bioinform. 2013 Nov;14(6):671-83. doi: 10.1093/bib/bbs046. Epub 2012 Sep 17.
10
Modelling and simulating generic RNA-Seq experiments with the flux simulator.
Nucleic Acids Res. 2012 Nov 1;40(20):10073-83. doi: 10.1093/nar/gks666. Epub 2012 Sep 7.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验