Saez-Rodriguez Julio, Goldsipe Arthur, Muhlich Jeremy, Alexopoulos Leonidas G, Millard Bjorn, Lauffenburger Douglas A, Sorger Peter K
Center for Cell Decision Processes, Department of Systems Biology, Harvard Medical School, Boston, MA 02115, USA.
Bioinformatics. 2008 Mar 15;24(6):840-7. doi: 10.1093/bioinformatics/btn018. Epub 2008 Jan 24.
Linking experimental data to mathematical models in biology is impeded by the lack of suitable software to manage and transform data. Model calibration would be facilitated and models would increase in value were it possible to preserve links to training data along with a record of all normalization, scaling, and fusion routines used to assemble the training data from primary results.
We describe the implementation of DataRail, an open source MATLAB-based toolbox that stores experimental data in flexible multi-dimensional arrays, transforms arrays so as to maximize information content, and then constructs models using internal or external tools. Data integrity is maintained via a containment hierarchy for arrays, imposition of a metadata standard based on a newly proposed MIDAS format, assignment of semantically typed universal identifiers, and implementation of a procedure for storing the history of all transformations with the array. We illustrate the utility of DataRail by processing a newly collected set of approximately 22 000 measurements of protein activities obtained from cytokine-stimulated primary and transformed human liver cells.
DataRail is distributed under the GNU General Public License and available at http://code.google.com/p/sbpipeline/
生物学中实验数据与数学模型的关联受到缺乏合适软件来管理和转换数据的阻碍。如果能够在保存与训练数据的链接以及用于从原始结果组装训练数据的所有归一化、缩放和融合例程记录的同时,模型校准将变得更加容易,并且模型的价值也会增加。
我们描述了DataRail的实现,这是一个基于MATLAB的开源工具箱,它将实验数据存储在灵活的多维数组中,转换数组以最大化信息内容,然后使用内部或外部工具构建模型。通过数组的包含层次结构、基于新提出的MIDAS格式施加元数据标准、分配语义类型化的通用标识符以及实现用于存储数组所有转换历史的过程来维护数据完整性。我们通过处理一组新收集的约22000个从细胞因子刺激的原代和转化人肝细胞获得的蛋白质活性测量数据来说明DataRail的实用性。
DataRail根据GNU通用公共许可证分发,可在http://code.google.com/p/sbpipeline/获得。