Institute of Interventional and Diagnostic Radiology and Neuroradiology, University Hospital Essen, Essen, Germany.
Institute for Artificial Intelligence in Medicine, University Hospital Essen, Essen, Germany.
Sci Data. 2024 May 10;11(1):483. doi: 10.1038/s41597-024-03337-6.
The Sparsely Annotated Region and Organ Segmentation (SAROS) dataset was created using data from The Cancer Imaging Archive (TCIA) to provide a large open-access CT dataset with high-quality annotations of body landmarks. In-house segmentation models were employed to generate annotation proposals on randomly selected cases from TCIA. The dataset includes 13 semantic body region labels (abdominal/thoracic cavity, bones, brain, breast implant, mediastinum, muscle, parotid/submandibular/thyroid glands, pericardium, spinal cord, subcutaneous tissue) and six body part labels (left/right arm/leg, head, torso). Case selection was based on the DICOM series description, gender, and imaging protocol, resulting in 882 patients (438 female) for a total of 900 CTs. Manual review and correction of proposals were conducted in a continuous quality control cycle. Only every fifth axial slice was annotated, yielding 20150 annotated slices from 28 data collections. For the reproducibility on downstream tasks, five cross-validation folds and a test set were pre-defined. The SAROS dataset serves as an open-access resource for training and evaluating novel segmentation models, covering various scanner vendors and diseases.
Sparsely Annotated Region and Organ Segmentation(SAROS)数据集使用来自癌症成像档案(TCIA)的数据创建,旨在提供一个大型的开放访问 CT 数据集,其中包含高质量的身体地标注释。内部分割模型用于在 TCIA 中随机选择的病例上生成注释建议。该数据集包括 13 个语义身体区域标签(腹部/胸腔、骨骼、大脑、乳房植入物、纵隔、肌肉、腮腺/颌下腺/甲状腺、心包、脊髓、皮下组织)和 6 个身体部位标签(左/右臂/腿、头、躯干)。病例选择基于 DICOM 系列描述、性别和成像协议,共纳入 882 名患者(438 名女性),总计 900 次 CT 扫描。在持续的质量控制循环中,对建议进行了手动审查和更正。仅对每第五个轴向切片进行注释,从 28 个数据集生成了 20150 个注释切片。为了在下游任务中实现可重复性,预先定义了五个交叉验证折叠和一个测试集。SAROS 数据集是一个用于训练和评估新型分割模型的开放访问资源,涵盖了各种扫描仪供应商和疾病。