Damon Stephen M, Boyd Brian D, Plassard Andrew J, Taylor Warren, Landman Bennett A
Electrical Engineering, Vanderbilt University, 2301 Vanderbilt Place, Nashville, TN USA 37235.
Psychiatry, Vanderbilt University, 2301 Vanderbilt Place, Nashville, TN USA 37235.
Proc SPIE Int Soc Opt Eng. 2017;2017. doi: 10.1117/12.2254371. Epub 2017 Mar 13.
Large scale image processing demands a standardized way of not only storage but also a method for job distribution and scheduling. The eXtensible Neuroimaging Archive Toolkit (XNAT) is one of several platforms that seeks to solve the storage issues. Distributed Automation for XNAT (DAX) is a job control and distribution manager. Recent massive data projects have revealed several bottlenecks for projects with >100,000 assessors (i.e., data processing pipelines in XNAT). In order to address these concerns, we have developed a new API, which exposes a direct connection to the database rather than REST API calls to accomplish the generation of assessors. This method, consistent with XNAT, keeps a full history for auditing purposes. Additionally, we have optimized DAX to keep track of processing status on disk (called DISKQ) rather than on XNAT, which greatly reduces load on XNAT by vastly dropping the number of API calls. Finally, we have integrated DAX into a Docker container with the idea of using it as a Docker controller to launch Docker containers of image processing pipelines. Using our new API, we reduced the time to create 1,000 assessors (a sub-cohort of our case project) from 65040 seconds to 229 seconds (a decrease of over 270 fold). DISKQ, using pyXnat, allows launching of 400 jobs in under 10 seconds which previously took 2,000 seconds. Together these updates position DAX to support projects with hundreds of thousands of scans and to run them in a time-efficient manner.
大规模图像处理不仅需要一种标准化的存储方式,还需要一种作业分配和调度方法。可扩展神经影像存档工具包(XNAT)是旨在解决存储问题的多个平台之一。XNAT分布式自动化(DAX)是一个作业控制和分配管理器。最近的大规模数据项目揭示了针对拥有超过100,000名评估者的项目(即XNAT中的数据处理管道)存在的几个瓶颈。为了解决这些问题,我们开发了一种新的应用程序编程接口(API),它直接连接到数据库,而不是通过REST API调用完成评估者的生成。这种方法与XNAT一致,为审计目的保留完整的历史记录。此外,我们对DAX进行了优化,以跟踪磁盘上的处理状态(称为DISKQ),而不是XNAT上的状态,这通过大幅减少API调用次数大大降低了XNAT的负载。最后,我们将DAX集成到一个Docker容器中,目的是将其用作Docker控制器来启动图像处理管道的Docker容器。使用我们的新API,我们将创建1000名评估者(我们案例项目的一个子队列)的时间从65040秒减少到229秒(减少了270多倍)。使用pyXnat的DISKQ允许在不到10秒内启动400个作业,而之前这需要2000秒。这些更新共同使DAX能够支持拥有数十万次扫描的项目,并以高效的方式运行它们。