Mao Yifei, Heuvelmans Marjolein A, van Tuinen Marcel, Yu Donghoon, Yi Jaeyoun, Oudkerk Matthijs, Ye Zhaoxiang, de Bock Geertruida H, Dorrius Monique D
Department of Epidemiology, University Medical Center Groningen, University of Groningen, Groningen, The Netherlands.
Institute for Diagnostic Accuracy, Groningen, The Netherlands.
Eur Radiol. 2025 Aug 29. doi: 10.1007/s00330-025-11949-8.
To assess the impact of reconstruction parameters on AI's performance in detecting and classifying risk-dominant nodules in a baseline low-dose CT (LDCT) screening among a Chinese general population.
Baseline LDCT scans from 300 consecutive participants in the Netherlands and China Big-3 (NELCIN-B3) trial were included. AI analyzed each scan reconstructed with four settings: 1 mm/0.7 mm thickness/interval with medium-soft and hard kernels (D45f/1 mm, B80f/1 mm) and 2 mm/1 mm with soft and medium-soft kernels (B30f/2 mm, D45f/2 mm). Reading results from consensus read by two radiologists served as reference standard. At scan level, inter-reader agreement between AI and reference standard, sensitivity, and specificity in determining the presence of a risk-dominant nodule were evaluated. For reference-standard risk-dominant nodules, nodule detection rate, and agreement in nodule type classification between AI and reference standard were assessed.
AI-D45f/1 mm demonstrated a significantly higher sensitivity than AI-B80f/1 mm in determining the presence of a risk-dominant nodule per scan (77.5% vs. 31.5%, p < 0.0001). For reference-standard risk-dominant nodules (111/300, 37.0%), kernel variations (AI-D45f/1 mm vs. AI-B80f/1 mm) did not significantly affect AI's nodule detection rate (87.4% vs. 82.0%, p = 0.26) but substantially influenced the agreement in nodule type classification between AI and reference standard (87.7% [50/57] vs. 17.7% [11/62], p < 0.0001). Change in thickness/interval (AI-D45f/1 mm vs. AI-D45f/2 mm) had no substantial influence on any of AI's performance (p > 0.05).
Variations in reconstruction kernels significantly affected AI's performance in risk-dominant nodule type classification, but not nodule detection. Ensuring consistency with radiologist-preferred kernels significantly improved agreement in nodule type classification and may help integrate AI more smoothly into clinical workflows.
Question Patient management in lung cancer screening depends on the risk-dominant nodule, yet no prior studies have assessed the impact of reconstruction parameters on AI performance for these nodules. Findings The difference between reconstruction kernels (AI-D45f/1 mm vs. AI-B80f/1 mm, or AI-B30f/2 mm vs. AI-D45f/2 mm) significantly affected AI's performance in risk-dominant nodule type classification, but not nodule detection. Clinical relevance The use of kernel for AI consistent with radiologist's choice is likely to improve the overall performance of AI-based CAD systems as an independent reader and support greater clinical acceptance and integration of AI tools into routine practice.
评估重建参数对人工智能在中国普通人群基线低剂量CT(LDCT)筛查中检测和分类风险主导型结节性能的影响。
纳入荷兰和中国大型3(NELCIN-B3)试验中300名连续参与者的基线LDCT扫描。人工智能分析了以四种设置重建的每次扫描:1毫米/0.7毫米层厚/层间距,采用中软和硬核(D45f/1毫米,B80f/1毫米)以及2毫米/1毫米,采用软和中软核(B30f/2毫米,D45f/2毫米)。两位放射科医生的共识读片结果作为参考标准。在扫描层面,评估人工智能与参考标准之间的阅片者间一致性、检测风险主导型结节存在的敏感性和特异性。对于参考标准风险主导型结节,评估结节检测率以及人工智能与参考标准在结节类型分类上的一致性。
在确定每次扫描中风险主导型结节的存在方面,人工智能-D45f/1毫米的敏感性显著高于人工智能-B80f/1毫米(77.5%对31.5%,p<0.0001)。对于参考标准风险主导型结节(111/300,37.0%),核的变化(人工智能-D45f/1毫米对人工智能-B80f/1毫米)对人工智能的结节检测率没有显著影响(87.4%对82.0%,p = 0.26),但对人工智能与参考标准在结节类型分类上的一致性有显著影响(87.7%[50/57]对17.7%[11/62],p<0.0001)。层厚/层间距的变化(人工智能-D45f/1毫米对人工智能-D45f/2毫米)对人工智能的任何性能均无实质性影响(p>0.05)。
重建核的变化显著影响人工智能在风险主导型结节类型分类中的性能,但不影响结节检测。确保与放射科医生偏好的核一致可显著提高结节类型分类的一致性,并可能有助于将人工智能更顺利地整合到临床工作流程中。
问题 肺癌筛查中的患者管理取决于风险主导型结节,但此前尚无研究评估重建参数对这些结节的人工智能性能的影响。发现 重建核之间的差异(人工智能-D45f/1毫米对人工智能-B80f/1毫米,或人工智能-B30f/2毫米对人工智能-D45f/2毫米)显著影响人工智能在风险主导型结节类型分类中的性能,但不影响结节检测。临床意义 使用与放射科医生选择一致的核进行人工智能操作,可能会提高基于人工智能的计算机辅助检测系统作为独立阅片者的整体性能,并支持人工智能工具在日常实践中获得更大的临床接受度和整合度。