Amini Elaheh, Klein Ran
Systems and Computer Engineering, Carleton University, Ottawa, ON, Canada.
Division of Nuclear Medicine and Molecular Imaging, Faculty of Medicine, University of Ottawa, Ottawa, ON, Canada.
Eur Radiol Exp. 2025 Sep 4;9(1):86. doi: 10.1186/s41747-025-00623-9.
Lung lobe segmentation is required to assess lobar function with nuclear imaging before surgical interventions. We evaluated the performance of open-source deep learning-based lung lobe segmentation tools, compared to a similar nnU-Net model trained on a smaller but more representative clinical dataset.
We collated and semi-automatically segmented an internal dataset of 164 computed tomography scans and classified them for task difficulty as easy, moderate, or hard. The performance of three open-source models-multi-organ objective segmentation (MOOSE), TotalSegmentator, and LungMask-was assessed using Dice similarity coefficient (DSC), robust Hausdorff distance (rHd95), and normalized surface distance (NSD). Additionally, we trained, validated, and tested an nnU-Net model using our local dataset and compared its performance with that of the other software on the test subset. All models were evaluated for generalizability using an external competition (LOLA11, n = 55).
TotalSegmentator outperformed MOOSE in DSC and NSD across all difficulty levels (p < 0.001), but not in rHd95 (p = 1.000). MOOSE and TotalSegmentator surpassed LungMask across metrics and difficulty classes (p < 0.001). Our model exceeded all other models on the internal dataset (n = 33) in all metrics, across all difficulty classes (p < 0.001), and on the external dataset. Missing lobes were correctly identified only by our model and LungMask in 3 and 1 of 7 cases, respectively.
Open-source segmentation tools perform well in straightforward cases but struggle in unfamiliar, complex cases. Training on diverse, specialized datasets can improve generalizability, emphasizing representative data over sheer quantity.
Training lung lobe segmentation models on a local variety of cases improves accuracy, thus enhancing presurgical planning, ventilation-perfusion analysis, and disease localization, potentially impacting treatment decisions and patient outcomes in respiratory and thoracic care.
Deep learning models trained on non-specialized datasets struggle with complex lung anomalies, yet their real-world limitations are insufficiently assessed. Training an identical model on a smaller yet clinically diverse and representative cohort improved performance in challenging cases. Data diversity outweighs the quantity in deep learning-based segmentation models. Accurate lung lobe segmentation may enhance presurgical assessment of lung lobar ventilation and perfusion function, optimizing clinical decision-making and patient outcomes.
在手术干预前,需要进行肺叶分割以通过核成像评估肺叶功能。我们评估了基于深度学习的开源肺叶分割工具的性能,并与在较小但更具代表性的临床数据集上训练的类似nnU-Net模型进行了比较。
我们整理并半自动分割了一个包含164例计算机断层扫描的内部数据集,并将其根据任务难度分为简单、中等或困难。使用骰子相似系数(DSC)、稳健豪斯多夫距离(rHd95)和归一化表面距离(NSD)评估了三种开源模型——多器官目标分割(MOOSE)、TotalSegmentator和LungMask的性能。此外,我们使用本地数据集训练、验证并测试了一个nnU-Net模型,并在测试子集中将其性能与其他软件的性能进行了比较。使用一个外部竞赛数据集(LOLA11,n = 55)评估了所有模型的泛化能力。
在所有难度级别上,TotalSegmentator在DSC和NSD方面均优于MOOSE(p < 0.001),但在rHd95方面并非如此(p = 1.000)。MOOSE和TotalSegmentator在各项指标和难度类别上均超过了LungMask(p < 0.001)。在内部数据集(n = 33)上,我们的模型在所有指标、所有难度类别以及外部数据集上均超过了所有其他模型(p < 0.001)。在7例病例中,只有我们的模型和LungMask分别正确识别出了3例和1例缺失的肺叶。
开源分割工具在简单病例中表现良好,但在不熟悉、复杂的病例中存在困难。在多样的、专门的数据集上进行训练可以提高泛化能力,强调代表性数据而非单纯的数量。
在本地各种病例上训练肺叶分割模型可提高准确性,从而加强术前规划、通气灌注分析和疾病定位,可能会影响呼吸和胸科护理中的治疗决策和患者预后。
在非专门数据集上训练的深度学习模型在处理复杂肺异常时存在困难,但其在现实世界中的局限性尚未得到充分评估。在较小但临床多样且具代表性的队列上训练相同模型可提高在具有挑战性病例中的性能。在基于深度学习的分割模型中,数据多样性比数量更重要。准确的肺叶分割可加强对肺叶通气和灌注功能的术前评估,优化临床决策和患者预后。