Department of Biostatistics, The Johns Hopkins University Bloomberg School of Public Health, 615 North Wolfe Street, Baltimore, MD 21205, USA.
BMC Bioinformatics. 2011 Nov 16;12:449. doi: 10.1186/1471-2105-12-449.
RNA sequencing is a flexible and powerful new approach for measuring gene, exon, or isoform expression. To maximize the utility of RNA sequencing data, new statistical methods are needed for clustering, differential expression, and other analyses. A major barrier to the development of new statistical methods is the lack of RNA sequencing datasets that can be easily obtained and analyzed in common statistical software packages such as R. To speed up the development process, we have created a resource of analysis-ready RNA-sequencing datasets. 2 DESCRIPTION: ReCount is an online resource of RNA-seq gene count tables and auxilliary data. Tables were built from raw RNA sequencing data from 18 different published studies comprising 475 samples and over 8 billion reads. Using the Myrna package, reads were aligned, overlapped with gene models and tabulated into gene-by-sample count tables that are ready for statistical analysis. Count tables and phenotype data were combined into Bioconductor ExpressionSet objects for ease of analysis. ReCount also contains the Myrna manifest files and R source code used to process the samples, allowing statistical and computational scientists to consider alternative parameter values. 3 CONCLUSIONS: By combining datasets from many studies and providing data that has already been processed from. fastq format into ready-to-use. RData and. txt files, ReCount facilitates analysis and methods development for RNA-seq count data. We anticipate that ReCount will also be useful for investigators who wish to consider cross-study comparisons and alternative normalization strategies for RNA-seq.
RNA 测序是一种灵活且强大的新方法,可用于测量基因、外显子或异构体的表达。为了最大限度地利用 RNA 测序数据,需要新的统计方法来进行聚类、差异表达和其他分析。开发新统计方法的主要障碍是缺乏可在 R 等常见统计软件包中轻松获取和分析的 RNA 测序数据集。为了加快开发过程,我们创建了一个可分析的 RNA 测序数据集资源。
ReCount 是一个在线 RNA-seq 基因计数表和辅助数据资源。表是从 18 项不同的已发表研究的原始 RNA 测序数据构建的,这些研究共包含 475 个样本和超过 80 亿个reads。使用 Myrna 包,将 reads 进行比对、与基因模型重叠,并将其制表为适合统计分析的基因样本计数表。将计数表和表型数据组合到 Bioconductor ExpressionSet 对象中,以便于分析。ReCount 还包含用于处理样本的 Myrna 清单文件和 R 源代码,允许统计和计算科学家考虑替代参数值。
通过合并来自多个研究的数据集,并提供已从 fastq 格式处理为可用于. RData 和. txt 文件的数据集,ReCount 促进了 RNA-seq 计数数据的分析和方法开发。我们预计 ReCount 对于希望考虑跨研究比较和 RNA-seq 替代标准化策略的研究人员也将非常有用。