Computer Science, Vanderbilt University, Nashville, TN, USA.
Electrical and Computer Engineering, Vanderbilt University, Nashville, TN, USA.
J Digit Imaging. 2022 Dec;35(6):1576-1589. doi: 10.1007/s10278-022-00679-8. Epub 2022 Aug 3.
A robust medical image computing infrastructure must host massive multimodal archives, perform extensive analysis pipelines, and execute scalable job management. An emerging data format standard, the Brain Imaging Data Structure (BIDS), introduces complexities for interfacing with XNAT archives. Moreover, workflow integration is combinatorically problematic when matching large amount of processing to large datasets. Historically, workflow engines have been focused on refining workflows themselves instead of actual job generation. However, such an approach is incompatible with data centric architecture that hosts heterogeneous medical image computing. Distributed automation for XNAT toolkit (DAX) provides large-scale image storage and analysis pipelines with an optimized job management tool. Herein, we describe developments for DAX that allows for integration of XNAT and BIDS standards. We also improve DAX's efficiencies of diverse containerized workflows in a high-performance computing (HPC) environment. Briefly, we integrate YAML configuration processor scripts to abstract workflow data inputs, data outputs, commands, and job attributes. Finally, we propose an online database-driven mechanism for DAX to efficiently identify the most recent updated sessions, thereby improving job building efficiency on large projects. We refer the proposed overall DAX development in this work as DAX-1 (DAX version 1). To validate the effectiveness of the new features, we verified (1) the efficiency of converting XNAT data to BIDS format and the correctness of the conversion using a collection of BIDS standard containerized neuroimaging workflows, (2) how YAML-based processor simplified configuration setup via a sequence of application pipelines, and (3) the productivity of DAX-1 on generating actual HPC processing jobs compared with earlier DAX baseline method. The empirical results show that (1) DAX-1 converting XNAT data to BIDS has similar speed as accessing XNAT data only; (2) YAML can integrate to the DAX-1 with shallow learning curve for users, and (3) DAX-1 reduced the job/assessor generation latency by finding recent modified sessions. Herein, we present approaches for efficiently integrating XNAT and modern image formats with a scalable workflow engine for the large-scale dataset access and processing.
一个强大的医学影像计算基础设施必须托管大规模的多模态档案,执行广泛的分析管道,并执行可扩展的作业管理。新兴的数据格式标准,即脑成像数据结构(BIDS),为与 XNAT 档案接口带来了复杂性。此外,当将大量处理与大型数据集匹配时,工作流程集成在组合上是有问题的。从历史上看,工作流引擎一直专注于改进工作流程本身,而不是实际的作业生成。然而,这种方法与托管异构医学影像计算的以数据为中心的架构不兼容。用于 XNAT 工具包的分布式自动化(DAX)为大规模图像存储和分析管道提供了优化的作业管理工具。在这里,我们描述了 DAX 的开发,该开发允许集成 XNAT 和 BIDS 标准。我们还提高了 DAX 在高性能计算(HPC)环境中各种容器化工作流程的效率。简而言之,我们集成了 YAML 配置处理器脚本,以抽象工作流数据输入、数据输出、命令和作业属性。最后,我们提出了一种在线数据库驱动的 DAX 机制,以有效地识别最新更新的会话,从而提高大型项目的作业构建效率。我们将这项工作中的总体 DAX 开发称为 DAX-1(DAX 版本 1)。为了验证新功能的有效性,我们验证了(1)使用一系列符合 BIDS 标准的容器化神经影像学工作流程,将 XNAT 数据转换为 BIDS 格式的效率和转换的正确性,(2)基于 YAML 的处理器如何通过一系列应用程序管道简化配置设置,以及(3)与早期的 DAX 基线方法相比,DAX-1 在生成实际 HPC 处理作业方面的生产力。实验结果表明:(1)DAX-1 将 XNAT 数据转换为 BIDS 的速度与仅访问 XNAT 数据的速度相似;(2)YAML 可以与 DAX-1 集成,用户学习曲线较浅,以及(3)DAX-1 通过查找最近修改的会话减少了作业/评估器生成的延迟。在此,我们提出了一种有效的方法,用于将 XNAT 和现代图像格式与可扩展的工作流程引擎集成,以便对大规模数据集进行访问和处理。