Bemis Kylie A, Vitek Olga
College of Computer and Information Science, Northeastern University, Boston, MA USA 02115.
College of Science, Northeastern University, Boston, MA USA 02115.
Bioinformatics. 2017 Oct 1;33(19):3142-3144. doi: 10.1093/bioinformatics/btx392. Epub 2017 Jun 15.
We introduce matter , an R package for direct interactions with larger-than-memory datasets, stored in an arbitrary number of files of any size. matter is primarily designed for datasets in new and rapidly evolving file formats, which may lack extensive software support. matter enables a wide variety of data exploration and manipulation steps, and is extensible to many bioinformatics applications. It supports reproducible research by minimizing the need of converting and storing data in multiple formats. We illustrate the performance of matter in conjunction with the Bioconductor package Cardinal for analysis of high-resolution, high-throughput mass spectrometry imaging experiments.
The package, vignettes, and examples of applications in several areas of bioinformatics are available open-source at www.bioconductor.org under the Artistic-2.0 license.
我们介绍了matter,这是一个R包,用于与大于内存的数据集进行直接交互,这些数据集存储在任意数量、任意大小的文件中。matter主要针对新的、快速发展的文件格式中的数据集设计,这些格式可能缺乏广泛的软件支持。matter支持各种各样的数据探索和操作步骤,并且可扩展到许多生物信息学应用。它通过尽量减少将数据转换并存储为多种格式的需求来支持可重复性研究。我们结合Bioconductor包Cardinal说明了matter在分析高分辨率、高通量质谱成像实验中的性能。
该包、vignette以及生物信息学多个领域的应用示例在www.bioconductor.org上以开源形式提供,遵循Artistic-2.0许可。