School of Computer Science and Technology, Shandong University of Finance and Economics, Jinan, China.
School of Information Science and Technology, Linyi University, Linyi, China.
Sci Data. 2024 Jul 27;11(1):824. doi: 10.1038/s41597-024-03658-6.
Recently, Computer-Aided Diagnosis (CAD) systems have emerged as indispensable tools in clinical diagnostic workflows, significantly alleviating the burden on radiologists. Nevertheless, despite their integration into clinical settings, CAD systems encounter limitations. Specifically, while CAD systems can achieve high performance in the detection of lung nodules, they face challenges in accurately predicting multiple cancer types. This limitation can be attributed to the scarcity of publicly available datasets annotated with expert-level cancer type information. This research aims to bridge this gap by providing publicly accessible datasets and reliable tools for medical diagnosis, facilitating a finer categorization of different types of lung diseases so as to offer precise treatment recommendations. To achieve this objective, we curated a diverse dataset of lung Computed Tomography (CT) images, comprising 330 annotated nodules (nodules are labeled as bounding boxes) from 95 distinct patients. The quality of the dataset was evaluated using a variety of classical classification and detection models, and these promising results demonstrate that the dataset has a feasible application and further facilitate intelligent auxiliary diagnosis.
最近,计算机辅助诊断(CAD)系统已经成为临床诊断工作流程中不可或缺的工具,大大减轻了放射科医生的负担。然而,尽管 CAD 系统已经集成到临床环境中,但它们仍存在局限性。具体来说,虽然 CAD 系统在检测肺结节方面可以达到很高的性能,但它们在准确预测多种癌症类型方面面临挑战。这种局限性可以归因于缺乏具有专家级癌症类型信息注释的公开可用数据集。本研究旨在通过提供公开可用的数据集和可靠的医疗诊断工具来弥合这一差距,以便更精细地对不同类型的肺部疾病进行分类,从而提供精确的治疗建议。为了实现这一目标,我们整理了一个包含 95 名不同患者的 330 个注释结节(结节用边界框标记)的肺部 CT 图像的多样化数据集。我们使用各种经典的分类和检测模型评估了数据集的质量,这些有前途的结果表明该数据集具有可行的应用,并进一步促进了智能辅助诊断。