Sun Yichun, Guerrero-López Alejandro, Arias-Londoño Julián D, Godino-Llorente Juan I
Escuela Técnica Superior de Ingenieros de Telecomunicación, Universidad Politécnica de Madrid, Av. Complutense, 30, 28040, Madrid, Spain.
Data Brief. 2025 Aug 13;62:111962. doi: 10.1016/j.dib.2025.111962. eCollection 2025 Oct.
This article introduces a dataset of computed tomography (CT) scans of the paranasal sinuses collected from 6 distinct hospitals, using 4 different CT devices, and ensuring diverse recording conditions. The dataset includes CT axial scans from 40 subjects, 13 of which were manually annotated with a semantic segmentation of the osseous structures of the area surrounding the paranasal sinuses, while the remaining 27 subjects contain unannotated CT scans. The data was organized into raw DICOM files and was also stored as uncompressed PNG images. The dataset includes an average of 212±105 slices per subject, while the annotated subset contains 696 masks paired with their corresponding CT slice. To further enhance the dataset, a set of automatically delineated masks (i.e., pseudo-labels) is also included for the unannotated CT scans. This dataset is highly valuable for medical image analysis, particularly to train and evaluate deep learning sematic segmentation models to identify the osseous structures surrounding the paranasal sinuses, as well as to explore domain adaptation techniques across different imaging devices. Additionally, it supports research in areas such as resolution enhancement and cross-device generalization, positioning it as an essential resource for advancing the robustness and generalizability of artificial intelligence driven medical image analysis tools.
本文介绍了一个鼻窦计算机断层扫描(CT)数据集,该数据集来自6家不同的医院,使用4种不同的CT设备采集,并确保记录条件多样。该数据集包括40名受试者的CT轴向扫描,其中13例对鼻窦周围区域的骨性结构进行了语义分割的手动标注,其余27例受试者包含未标注的CT扫描。数据被整理成原始DICOM文件,也存储为未压缩的PNG图像。该数据集平均每个受试者有212±105个切片,而标注子集包含696个与其相应CT切片配对的掩码。为了进一步增强该数据集,还为未标注的CT扫描包含了一组自动勾勒的掩码(即伪标签)。该数据集对于医学图像分析非常有价值,特别是用于训练和评估深度学习语义分割模型,以识别鼻窦周围的骨性结构,以及探索不同成像设备之间的域适应技术。此外,它支持分辨率增强和跨设备泛化等领域的研究,使其成为推进人工智能驱动的医学图像分析工具的鲁棒性和泛化性的重要资源。