Keator David B, Grethe J S, Marcus D, Ozyurt B, Gadde S, Murphy Sean, Pieper S, Greve D, Notestine R, Bockholt H J, Papadopoulos P
University of California, Irvine, CA 92691, USA.
IEEE Trans Inf Technol Biomed. 2008 Mar;12(2):162-72. doi: 10.1109/TITB.2008.917893.
The aggregation of imaging, clinical, and behavioral data from multiple independent institutions and researchers presents both a great opportunity for biomedical research as well as a formidable challenge. Many research groups have well-established data collection and analysis procedures, as well as data and metadata format requirements that are particular to that group. Moreover, the types of data and metadata collected are quite diverse, including image, physiological, and behavioral data, as well as descriptions of experimental design, and preprocessing and analysis methods. Each of these types of data utilizes a variety of software tools for collection, storage, and processing. Furthermore sites are reluctant to release control over the distribution and access to the data and the tools. To address these needs, the Biomedical Informatics Research Network (BIRN) has developed a federated and distributed infrastructure for the storage, retrieval, analysis, and documentation of biomedical imaging data. The infrastructure consists of distributed data collections hosted on dedicated storage and computational resources located at each participating site, a federated data management system and data integration environment, an Extensible Markup Language (XML) schema for data exchange, and analysis pipelines, designed to leverage both the distributed data management environment and the available grid computing resources.
整合来自多个独立机构和研究人员的成像、临床及行为数据,这既为生物医学研究带来了巨大机遇,也构成了一项艰巨挑战。许多研究团队都有完善的数据收集与分析流程,以及各自特有的数据和元数据格式要求。此外,所收集的数据和元数据类型颇为多样,包括图像、生理和行为数据,以及实验设计描述、预处理和分析方法。这些数据类型中的每一种都利用各种软件工具进行收集、存储和处理。此外,各研究点不愿放弃对数据及工具的分发和访问控制权。为满足这些需求,生物医学信息学研究网络(BIRN)已开发出一种联合分布式基础设施,用于生物医学成像数据的存储、检索、分析和记录。该基础设施包括托管在各参与研究点的专用存储和计算资源上的分布式数据集合、联合数据管理系统和数据集成环境、用于数据交换的可扩展标记语言(XML)架构,以及旨在利用分布式数据管理环境和可用网格计算资源的分析管道。