Department of Biomedical Informatics, UAMS, 4301 West Markham St, Little Rock, AR, 72205, USA.
Department of Radiation Oncology, Washington University School of Medicine, St. Louis, MO, USA.
J Digit Imaging. 2018 Dec;31(6):783-791. doi: 10.1007/s10278-018-0097-4.
Reusable, publicly available data is a pillar of open science and rapid advancement of cancer imaging research. Sharing data from completed research studies not only saves research dollars required to collect data, but also helps insure that studies are both replicable and reproducible. The Cancer Imaging Archive (TCIA) is a global shared repository for imaging data related to cancer. Insuring the consistency, scientific utility, and anonymity of data stored in TCIA is of utmost importance. As the rate of submission to TCIA has been increasing, both in volume and complexity of DICOM objects stored, the process of curation of collections has become a bottleneck in acquisition of data. In order to increase the rate of curation of image sets, improve the quality of the curation, and better track the provenance of changes made to submitted DICOM image sets, a custom set of tools was developed, using novel methods for the analysis of DICOM data sets. These tools are written in the programming language perl, use the open-source database PostgreSQL, make use of the perl DICOM routines in the open-source package Posda, and incorporate DICOM diagnostic tools from other open-source packages, such as dicom3tools. These tools are referred to as the "Posda Tools." The Posda Tools are open source and available via git at https://github.com/UAMS-DBMI/PosdaTools . In this paper, we briefly describe the Posda Tools and discuss the novel methods employed by these tools to facilitate rapid analysis of DICOM data, including the following: (1) use a database schema which is more permissive, and differently normalized from traditional DICOM databases; (2) perform integrity checks automatically on a bulk basis; (3) apply revisions to DICOM datasets on an bulk basis, either through a web-based interface or via command line executable perl scripts; (4) all such edits are tracked in a revision tracker and may be rolled back; (5) a UI is provided to inspect the results of such edits, to verify that they are what was intended; (6) identification of DICOM Studies, Series, and SOP instances using "nicknames" which are persistent and have well-defined scope to make expression of reported DICOM errors easier to manage; and (7) rapidly identify potential duplicate DICOM datasets by pixel data is provided; this can be used, e.g., to identify submission subjects which may relate to the same individual, without identifying the individual.
可重复使用且公开可用的数据是开放科学和癌症成像研究快速发展的基础。共享已完成研究的数据不仅可以节省收集数据所需的研究资金,还可以确保研究具有可重复性和可再现性。癌症成像档案 (TCIA) 是一个全球性的成像数据共享存储库,用于癌症相关研究。确保 TCIA 中存储的数据的一致性、科学实用性和匿名性至关重要。随着 TCIA 提交的数据量和存储的 DICOM 对象的复杂性不断增加,数据采集的过程已成为策展的瓶颈。为了提高图像集策展的速度,提高策展的质量,并更好地跟踪对提交的 DICOM 图像集所做更改的出处,我们开发了一组自定义工具,使用分析 DICOM 数据集的新方法。这些工具使用 Perl 编程语言编写,使用开源数据库 PostgreSQL,使用开源包 Posda 中的 perl DICOM 例程,并结合其他开源包(如 dicom3tools)中的 DICOM 诊断工具。这些工具被称为“Posda 工具”。Posda 工具是开源的,可以通过 https://github.com/UAMS-DBMI/PosdaTools 从 Git 上获取。在本文中,我们简要描述了 Posda 工具,并讨论了这些工具所采用的新颖方法,以促进对 DICOM 数据的快速分析,包括以下内容:(1) 使用更宽松且不同于传统 DICOM 数据库的数据库模式;(2) 自动批量执行完整性检查;(3) 通过基于 Web 的界面或通过命令行可执行 perl 脚本批量应用对 DICOM 数据集的修订;(4) 在修订跟踪器中跟踪所有此类编辑,并可以回滚;(5) 提供一个用户界面来检查此类编辑的结果,以验证它们是否符合预期;(6) 使用“昵称”标识 DICOM 研究、系列和 SOP 实例,这些“昵称”是持久的,并且具有明确定义的范围,从而更易于管理报告的 DICOM 错误;(7) 通过像素数据快速识别潜在的重复 DICOM 数据集;例如,可以使用它来识别可能与同一人相关的提交主题,而无需识别个人。