Scharpf Robert B, Ruczinski Ingo
Biostatistics, Johns Hopkins Bloomberg School of Public Health, Baltimore, MD, USA.
Methods Mol Biol. 2010;593:67-79. doi: 10.1007/978-1-60327-194-3_4.
The Bioconductor project is an "open source and open development software project for the analysis and comprehension of genomic data" (1), primarily based on the R programming language. Infrastructure packages, such as Biobase, are maintained by Bioconductor core developers and serve several key roles to the broader community of Bioconductor software developers and users. In particular, Biobase introduces an S4 class, the eSet, for high-dimensional assay data. Encapsulating the assay data as well as meta-data on the samples, features, and experiment in the eSet class definition ensures propagation of the relevant sample and feature meta-data throughout an analysis. Extending the eSet class promotes code reuse through inheritance as well as interoperability with other R packages and is less error-prone. Recently proposed class definitions for high-throughput SNP arrays extend the eSet class. This chapter highlights the advantages of adopting and extending Biobase class definitions through a working example of one implementation of classes for the analysis of high-throughput SNP arrays.
生物导体项目是一个“用于基因组数据分析与理解的开源且开放开发的软件项目”(1),主要基于R编程语言。诸如Biobase等基础架构包由生物导体核心开发者维护,对更广泛的生物导体软件开发人员和用户群体起着几个关键作用。特别是,Biobase引入了用于高维分析数据的S4类——eSet。在eSet类定义中封装分析数据以及样本、特征和实验的元数据,可确保相关样本和特征元数据在整个分析过程中得以传播。扩展eSet类可通过继承促进代码重用,以及与其他R包的互操作性,并且出错几率更小。最近提出的高通量SNP阵列类定义扩展了eSet类。本章通过一个用于高通量SNP阵列分析的类实现的工作示例,突出采用和扩展Biobase类定义的优点。