Department of Radiation Oncology (Maastro), GROW School for Oncology, Maastricht University Medical Centre+, Maastricht, 6229 ET, The Netherlands.
Department of Radiation Oncology, Radboud University Medical Center, Nijmegen, 6525 GC, The Netherlands.
Med Phys. 2020 Nov;47(11):5931-5940. doi: 10.1002/mp.14322. Epub 2020 Jun 27.
One of the most frequently cited radiomics investigations showed that features automatically extracted from routine clinical images could be used in prognostic modeling. These images have been made publicly accessible via The Cancer Imaging Archive (TCIA). There have been numerous requests for additional explanatory metadata on the following datasets - RIDER, Interobserver, Lung1, and Head-Neck1. To support repeatability, reproducibility, generalizability, and transparency in radiomics research, we publish the subjects' clinical data, extracted radiomics features, and digital imaging and communications in medicine (DICOM) headers of these four datasets with descriptive metadata, in order to be more compliant with findable, accessible, interoperable, and reusable (FAIR) data management principles.
Overall survival time intervals were updated using a national citizens registry after internal ethics board approval. Spatial offsets of the primary gross tumor volume (GTV) regions of interest (ROIs) associated with the Lung1 CT series were improved on the TCIA. GTV radiomics features were extracted using the open-source Ontology-Guided Radiomics Analysis Workflow (O-RAW). We reshaped the output of O-RAW to map features and extraction settings to the latest version of Radiomics Ontology, so as to be consistent with the Image Biomarker Standardization Initiative (IBSI). Digital imaging and communications in medicine metadata was extracted using a research version of Semantic DICOM (SOHARD, GmbH, Fuerth; Germany). Subjects' clinical data were described with metadata using the Radiation Oncology Ontology. All of the above were published in Resource Descriptor Format (RDF), that is, triples. Example SPARQL queries are shared with the reader to use on the online triples archive, which are intended to illustrate how to exploit this data submission.
The accumulated RDF data are publicly accessible through a SPARQL endpoint where the triples are archived. The endpoint is remotely queried through a graph database web application at http://sparql.cancerdata.org. SPARQL queries are intrinsically federated, such that we can efficiently cross-reference clinical, DICOM, and radiomics data within a single query, while being agnostic to the original data format and coding system. The federated queries work in the same way even if the RDF data were partitioned across multiple servers and dispersed physical locations.
The public availability of these data resources is intended to support radiomics features replication, repeatability, and reproducibility studies by the academic community. The example SPARQL queries may be freely used and modified by readers depending on their research question. Data interoperability and reusability are supported by referencing existing public ontologies. The RDF data are readily findable and accessible through the aforementioned link. Scripts used to create the RDF are made available at a code repository linked to this submission: https://gitlab.com/UM-CDS/FAIR-compliant_clinical_radiomics_and_DICOM_metadata.
最常被引用的放射组学研究之一表明,可从常规临床图像中自动提取的特征可用于预后建模。这些图像已通过癌症成像档案(TCIA)公开提供。人们对以下数据集 - RIDER、Interobserver、Lung1 和 Head-Neck1 提出了大量关于附加说明性元数据的请求。为了支持放射组学研究的可重复性、可再现性、通用性和透明度,我们发布了这些四个数据集的受试者临床数据、提取的放射组学特征以及数字成像和通信医学(DICOM)标头,并附有描述性元数据,以便更符合可查找、可访问、互操作和可重用(FAIR)数据管理原则。
在内部伦理委员会批准后,使用国家公民登记处更新总生存期时间间隔。TCIA 上改进了与 Lung1 CT 系列相关的主要大体肿瘤体积(GTV)感兴趣区(ROI)的空间偏移。使用开源本体指导放射组学分析工作流程(O-RAW)提取 GTV 放射组学特征。我们重塑了 O-RAW 的输出,以便将特征和提取设置映射到最新版本的放射组学本体,从而与图像生物标志物标准化倡议(IBSI)保持一致。使用语义 DICOM 的研究版(SOHARD,GmbH, Fuerth;德国)提取 DICOM 元数据。使用放射肿瘤学本体对受试者临床数据进行描述。所有这些都以资源描述格式(RDF)发布,即三元组。与读者共享示例 SPARQL 查询,以便在在线三元组档案上使用,旨在说明如何利用此数据提交。
通过 SPARQL 端点公开访问累积的 RDF 数据,该端点存档了三元组。通过图形数据库 Web 应用程序远程查询该端点,该应用程序位于 http://sparql.cancerdata.org。SPARQL 查询本质上是联合的,因此我们可以在单个查询中有效地交叉引用临床、DICOM 和放射组学数据,同时对原始数据格式和编码系统保持不可知。即使 RDF 数据跨多个服务器和分散的物理位置进行分区,联合查询也以相同的方式工作。
这些数据资源的公开可用性旨在支持学术界的放射组学特征复制、可重复性和可再现性研究。读者可以根据自己的研究问题自由使用和修改示例 SPARQL 查询。通过引用现有的公共本体,支持数据互操作性和可重用性。通过上述链接,可以轻松找到和访问 RDF 数据。用于创建 RDF 的脚本可在与此提交相关联的代码存储库中获得:https://gitlab.com/UM-CDS/FAIR-compliant_clinical_radiomics_and_DICOM_metadata。