Hernandez-Ferrer Carles, Ruiz-Arenas Carlos, Beltran-Gomila Alba, González Juan R
Institut de Salut Global de Barcelona (ISGlobal) - Campus Mar, Barcelona Biulding: Biomedical Research Park, c/Dr. Aiguader, 88, 08003, Barcelona, Spain.
Universitat Pompeu Fabra (UPF), Barcelona, Spain.
BMC Bioinformatics. 2017 Jan 17;18(1):36. doi: 10.1186/s12859-016-1455-1.
Reduction in the cost of genomic assays has generated large amounts of biomedical-related data. As a result, current studies perform multiple experiments in the same subjects. While Bioconductor's methods and classes implemented in different packages manage individual experiments, there is not a standard class to properly manage different omic datasets from the same subjects. In addition, most R/Bioconductor packages that have been designed to integrate and visualize biological data often use basic data structures with no clear general methods, such as subsetting or selecting samples.
To cover this need, we have developed MultiDataSet, a new R class based on Bioconductor standards, designed to encapsulate multiple data sets. MultiDataSet deals with the usual difficulties of managing multiple and non-complete data sets while offering a simple and general way of subsetting features and selecting samples. We illustrate the use of MultiDataSet in three common situations: 1) performing integration analysis with third party packages; 2) creating new methods and functions for omic data integration; 3) encapsulating new unimplemented data from any biological experiment.
MultiDataSet is a suitable class for data integration under R and Bioconductor framework.
基因组检测成本的降低产生了大量与生物医学相关的数据。因此,当前的研究在同一受试者身上进行多项实验。虽然Bioconductor在不同包中实现的方法和类可以管理单个实验,但没有一个标准类来妥善管理来自同一受试者的不同组学数据集。此外,大多数旨在整合和可视化生物数据的R/Bioconductor包通常使用基本数据结构,没有明确的通用方法,如子集化或选择样本。
为满足这一需求,我们开发了MultiDataSet,这是一个基于Bioconductor标准的新R类,旨在封装多个数据集。MultiDataSet解决了管理多个不完整数据集时常见的困难,同时提供了一种简单通用的方法来进行特征子集化和样本选择。我们在三种常见情况下说明了MultiDataSet的使用:1)与第三方包进行整合分析;2)为组学数据整合创建新的方法和函数;3)封装来自任何生物实验的新的未实现数据。
MultiDataSet是R和Bioconductor框架下适合数据整合的类。