Department of Radiology, University Medical Center of Groningen, Groningen, 9713GZ, The Netherlands.
Department of Oral Surgery of the Medical Spectrum Twente (MST), Enschede, 7500KA, The Netherlands.
Eur Radiol Exp. 2024 May 20;8(1):63. doi: 10.1186/s41747-024-00459-9.
Emphysema influences the appearance of lung tissue in computed tomography (CT). We evaluated whether this affects lung nodule detection by artificial intelligence (AI) and human readers (HR).
Individuals were selected from the "Lifelines" cohort who had undergone low-dose chest CT. Nodules in individuals without emphysema were matched to similar-sized nodules in individuals with at least moderate emphysema. AI results for nodular findings of 30-100 mm and 101-300 mm were compared to those of HR; two expert radiologists blindly reviewed discrepancies. Sensitivity and false positives (FPs)/scan were compared for emphysema and non-emphysema groups.
Thirty-nine participants with and 82 without emphysema were included (n = 121, aged 61 ± 8 years (mean ± standard deviation), 58/121 males (47.9%)). AI and HR detected 196 and 206 nodular findings, respectively, yielding 109 concordant nodules and 184 discrepancies, including 118 true nodules. For AI, sensitivity was 0.68 (95% confidence interval 0.57-0.77) in emphysema versus 0.71 (0.62-0.78) in non-emphysema, with FPs/scan 0.51 and 0.22, respectively (p = 0.028). For HR, sensitivity was 0.76 (0.65-0.84) and 0.80 (0.72-0.86), with FPs/scan of 0.15 and 0.27 (p = 0.230). Overall sensitivity was slightly higher for HR than for AI, but this difference disappeared after the exclusion of benign lymph nodes. FPs/scan were higher for AI in emphysema than in non-emphysema (p = 0.028), while FPs/scan for HR were higher than AI for 30-100 mm nodules in non-emphysema (p = 0.009).
AI resulted in more FPs/scan in emphysema compared to non-emphysema, a difference not observed for HR.
In the creation of a benchmark dataset to validate AI software for lung nodule detection, the inclusion of emphysema cases is important due to the additional number of FPs.
• The sensitivity of nodule detection by AI was similar in emphysema and non-emphysema. • AI had more FPs/scan in emphysema compared to non-emphysema. • Sensitivity and FPs/scan by the human reader were comparable for emphysema and non-emphysema. • Emphysema and non-emphysema representation in benchmark dataset is important for validating AI.
肺气肿会影响计算机断层扫描(CT)中肺组织的外观。我们评估了这是否会影响人工智能(AI)和人类读者(HR)对肺结节的检测。
从“生命线”队列中选择接受低剂量胸部 CT 的个体。在没有肺气肿的个体中匹配与至少中度肺气肿的个体中相似大小的结节。比较 AI 对 30-100mm 和 101-300mm 结节的检测结果与 HR 的结果;两位专家放射科医生对差异进行了盲法审查。比较肺气肿和非肺气肿组的敏感性和假阳性(FP)/扫描。
共纳入 39 名肺气肿患者和 82 名非肺气肿患者(n=121,年龄 61±8 岁(均值±标准差),58/121 名男性(47.9%))。AI 和 HR 分别检测到 196 个和 206 个结节发现,分别有 109 个结节一致,184 个不一致,包括 118 个真正的结节。对于 AI,肺气肿中的敏感性为 0.68(95%置信区间 0.57-0.77),非肺气肿中为 0.71(0.62-0.78),FP/扫描分别为 0.51 和 0.22(p=0.028)。对于 HR,敏感性分别为 0.76(0.65-0.84)和 0.80(0.72-0.86),FP/扫描分别为 0.15 和 0.27(p=0.230)。HR 的总体敏感性略高于 AI,但在排除良性淋巴结后,这种差异消失了。与非肺气肿相比,AI 在肺气肿中的 FP/扫描更高(p=0.028),而对于非肺气肿的 30-100mm 结节,HR 的 FP/扫描高于 AI(p=0.009)。
与非肺气肿相比,AI 在肺气肿中产生了更多的 FP/扫描,而 HR 则没有观察到这种差异。
在创建用于验证肺结节检测人工智能软件的基准数据集时,由于 FP 的数量增加,纳入肺气肿病例很重要。
人工智能检测结节的敏感性在肺气肿和非肺气肿中相似。
AI 在肺气肿中的 FP/扫描量多于非肺气肿。
肺气肿和非肺气肿的 HR 敏感性和 FP/扫描量相当。
验证 AI 时,基准数据集中肺气肿和非肺气肿的代表性很重要。