Bona Jonathan P, Nolan Tracy S, Brochhausen Mathias
Department of Biomedical Informatics, University of Arkansas for Medical Sciences, Little Rock, Arkansas, United States.
CEUR Workshop Proc. 2018 Aug;2285.
The Cancer Imaging Archive (TCIA) hosts over 11 million de-identified medical images related to cancer for research reuse. These are organized around DICOM-format radiological collections that are grouped by disease type, modality, or research focus. Many collections also include diverse non-image datasets in a variety of formats without a common approach to representing the entities that the data are about. This paper describes work to make these diverse non-image data more accessible and usable by transforming them into integrated semantic representations using Open Biomedical Ontologies, highlights obstacles encountered in the data, and presents detailed representations data found in select collections.
癌症影像存档库(TCIA)存有超过1100万份与癌症相关的去标识化医学影像,以供研究复用。这些影像围绕DICOM格式的放射学数据集进行组织,这些数据集按疾病类型、模态或研究重点分组。许多数据集还包括各种格式的不同非图像数据集,对于表示数据所涉及的实体没有通用方法。本文描述了通过使用开放生物医学本体将这些不同的非图像数据转换为集成语义表示,使其更易于访问和使用的工作,突出了数据中遇到的障碍,并展示了在选定数据集中发现的详细表示数据。