Fan Mingdong, Wang Tonghe, Lei Yang, Patel Pretesh R, Dresser Sean, Ghavidel Beth Bradshaw, Qiu Richard L J, Zhou Jun, Luca Kirk, Kayode Oluwatosin, Bradley Jeffrey D, Yang Xiaofeng, Roper Justin
Department of Radiation Oncology and Winship Cancer Institute, Emory University, Atlanta, Georgia, USA.
Department of Medical Physics, Memorial Sloan Kettering Cancer Center, New York, New York, USA.
J Appl Clin Med Phys. 2025 Apr;26(4):e70010. doi: 10.1002/acm2.70010. Epub 2025 Feb 13.
Deep learning-based segmentation of organs-at-risk (OAR) is emerging to become mainstream in clinical practice because of the superior performance over atlas and model-based autocontouring methods. While several commercial deep learning-based autosegmentation solutions are now available, the implementation of these tools is still at such a primitive stage that acceptance criteria are underdeveloped due to a lack of knowledge about the systems' segmentation tendencies and failure modes. As the starting point of the iterative process of clinical implementation, this study focuses on the outlier analysis of four commercial autocontouring tools for the abdominal OARs.
The autosegmentation software, developed by Limbus AI, MIM Contour ProtégéAI, Radformation AutoContour, and Siemens syngo.via, were used to segment 111 patient cases. Geometric segmentation accuracy was quantitatively compared with clinical contours using the dice similarity coefficient (DSC) and 95% Hausdorff distance (HD95). The outliers from quantitative evaluations of each software were analyzed for the liver, stomach, and kidneys with the possible causes of outliers summarized into six categories: (1) difference in contouring style or guideline, (2) image acquisition and quality, (3) abnormal anatomy of the OAR, (4) abnormal anatomy of abutting organs/tissues, (5) external/internal devices, and (6) other causes.
For the liver segmentation, the most prominent cause of discrepancies for Limbus, which occurred in four of its six outliers, was the existence of biliary stent or internal/external biliary drain as well as the resulting pneumobilia. Siemens included the abutting organs that shared CT numbers similar to those of the liver in 5/8 outliers. 12 of 13 Radformation's liver segmentation outliers included the heart and/or stomach while MIM not only included the stomach in the presence of barium in 5/11 outliers, but also produced fragmented contours in 5/11 other cases. Only Limbus and Radformation provided stomach segmentation, and imaging with barium contrast directly caused incomplete stomach delineation in 10/12 Limbus outliers and 21/25 Radformation outliers. As for the kidneys, Radformation and Siemens consistently followed the RTOG contouring guidelines, whereas the institutional contours excluded the renal pelvis in some cases, resulting in 19/25 Radformation outliers and 18/23 Siemens outliers. By contrast, Limbus contours appeared to follow different contouring guidelines that exclude the renal pelvis. Fragmented kidney contours were found in 10/15 Limbus outliers and 25/26 MIM outliers. The ones in MIM were directly linked to the use of IV contrast in imaging, but there was not enough evidence to identify the origin of Limbus's fragmented contours.
The causes of the segmentation outliers of the four commercial deep learning-based autocontouring solutions were summarized for each OAR. This work can help the vendors improve their autosegmentation software and also inform the users of potential modes of failure when using the tools.
基于深度学习的危及器官(OAR)分割正逐渐成为临床实践的主流,因为其性能优于基于图谱和模型的自动轮廓描绘方法。虽然现在有几种基于深度学习的商业自动分割解决方案,但这些工具的实施仍处于非常原始的阶段,由于缺乏对系统分割倾向和失败模式的了解,验收标准尚不完善。作为临床实施迭代过程的起点,本研究聚焦于对四种用于腹部OAR的商业自动轮廓描绘工具的异常值分析。
使用由Limbus AI、MIM Contour ProtégéAI、Radformation AutoContour和西门子syngo.via开发的自动分割软件对111例患者病例进行分割。使用骰子相似系数(DSC)和95%豪斯多夫距离(HD95)将几何分割精度与临床轮廓进行定量比较。对每个软件定量评估中的异常值进行肝脏、胃和肾脏的分析,并将异常值的可能原因归纳为六类:(1)轮廓描绘风格或指南的差异,(2)图像采集与质量,(3)OAR的解剖结构异常,(4)相邻器官/组织的解剖结构异常,(5)外部/内部装置,以及(6)其他原因。
对于肝脏分割,Limbus在其六个异常值中有四个出现差异的最主要原因是存在胆管支架或内部/外部胆管引流以及由此产生的气腹。西门子在其八个异常值中有五个将与肝脏CT值相似的相邻器官包括在内。Radformation的13个肝脏分割异常值中有12个包括心脏和/或胃,而MIM不仅在11个异常值中有5个在存在钡剂时将胃包括在内,而且在另外11个病例中有5个产生了破碎的轮廓。只有Limbus和Radformation提供胃分割,钡剂造影成像直接导致12个Limbus异常值中有10个胃轮廓描绘不完整,25个Radformation异常值中有21个胃轮廓描绘不完整。至于肾脏,Radformation和西门子始终遵循RTOG轮廓描绘指南,而机构轮廓在某些情况下排除了肾盂,导致25个Radformation异常值中有19个,23个西门子异常值中有18个。相比之下,Limbus轮廓似乎遵循不同的轮廓描绘指南,排除了肾盂。在15个Limbus异常值中有10个发现肾脏轮廓破碎,26个MIM异常值中有25个发现肾脏轮廓破碎。MIM中的破碎轮廓与成像中使用静脉造影剂直接相关,但没有足够证据确定Limbus破碎轮廓的来源。
针对每种OAR总结了四种基于深度学习的商业自动轮廓描绘解决方案的分割异常值原因。这项工作有助于供应商改进其自动分割软件,也能让用户了解使用这些工具时潜在的失败模式。