Frazee Alyssa C, Jaffe Andrew E, Langmead Ben, Leek Jeffrey T
Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Center for Computational Biology and.
Department of Biostatistics, Johns Hopkins Bloomberg School of Public Health, Lieber Institute for Brain Development, Johns Hopkins Medical Campus, Center for Computational Biology and.
Bioinformatics. 2015 Sep 1;31(17):2778-84. doi: 10.1093/bioinformatics/btv272. Epub 2015 Apr 28.
Statistical methods development for differential expression analysis of RNA sequencing (RNA-seq) requires software tools to assess accuracy and error rate control. Since true differential expression status is often unknown in experimental datasets, artificially constructed datasets must be utilized, either by generating costly spike-in experiments or by simulating RNA-seq data.
Polyester is an R package designed to simulate RNA-seq data, beginning with an experimental design and ending with collections of RNA-seq reads. Its main advantage is the ability to simulate reads indicating isoform-level differential expression across biological replicates for a variety of experimental designs. Data generated by Polyester is a reasonable approximation to real RNA-seq data and standard differential expression workflows can recover differential expression set in the simulation by the user.
Polyester is freely available from Bioconductor (http://bioconductor.org/).
Supplementary data are available at Bioinformatics online.
RNA测序(RNA-seq)差异表达分析的统计方法开发需要软件工具来评估准确性和错误率控制。由于实验数据集中真正的差异表达状态通常是未知的,因此必须使用人工构建的数据集,要么通过进行成本高昂的掺入实验,要么通过模拟RNA-seq数据。
Polyester是一个R包,旨在模拟RNA-seq数据,从实验设计开始到RNA-seq读数的收集结束。其主要优点是能够针对各种实验设计模拟跨生物学重复显示异构体水平差异表达的读数。Polyester生成的数据是真实RNA-seq数据的合理近似值,标准的差异表达工作流程可以恢复用户在模拟中设置的差异表达。
Polyester可从Bioconductor(http://bioconductor.org/)免费获得。
补充数据可在《生物信息学》在线获取。