Department of Radiation Oncology, Massachusetts General Hospital and Harvard Medical School, Boston, Massachusetts.
Department of Radiation Physics - Patient Care, The University of Texas MD Anderson Cancer Center, Houston, Texas.
Pract Radiat Oncol. 2024 Jan-Feb;14(1):e75-e85. doi: 10.1016/j.prro.2023.09.004. Epub 2023 Oct 4.
Our purpose was to identify variations in the clinical use of automatically generated contours that could be attributed to software error, off-label use, or automation bias.
For 500 head and neck patients who were contoured by an in-house automated contouring system, Dice similarity coefficient and added path length were calculated between the contours generated by the automated system and the final contours after editing for clinical use. Statistical process control was used and control charts were generated with control limits at 3 standard deviations. Contours that exceeded the thresholds were investigated to determine the cause. Moving mean control plots were then generated to identify dosimetrists who were editing less over time, which could be indicative of automation bias.
Major contouring edits were flagged for: 1.0% brain, 3.1% brain stem, 3.5% left cochlea, 2.9% right cochlea, 4.8% esophagus, 4.1% left eye, 4.0% right eye, 2.2% left lens, 4.9% right lens, 2.5% mandible, 11% left optic nerve, 6.1% right optic nerve, 3.8% left parotid, 5.9% right parotid, and 3.0% of spinal cord contours. Identified causes of editing included unexpected patient positioning, deviation from standard clinical practice, and disagreement between dosimetrist preference and automated contouring style. A statistically significant (P < .05) difference was identified between the contour editing practice of dosimetrists, with 1 dosimetrist editing more across all organs at risk. Eighteen percent (27/150) of moving mean control plots created for 5 dosimetrists indicated the amount of contour editing was decreasing over time, possibly corresponding to automation bias.
The developed system was used to detect statistically significant edits caused by software error, unexpected clinical use, and automation bias. The increased ability to detect systematic errors that occur when editing automatically generated contours will improve the safety of the automatic treatment planning workflow.
本研究旨在识别自动生成轮廓的临床应用差异,这些差异可能归因于软件错误、超适应证使用或自动化偏差。
对 500 例头颈部患者进行了内部自动勾画,计算了自动生成的轮廓与用于临床编辑后的最终轮廓之间的 Dice 相似系数和附加路径长度。使用统计过程控制生成控制图,控制限为 3 个标准差。超出阈值的轮廓进行了调查,以确定原因。然后生成移动均值控制图,以确定随着时间的推移编辑量减少的剂量师,这可能表明存在自动化偏差。
主要的轮廓编辑标志为:脑 1.0%、脑干 3.1%、左侧耳蜗 3.5%、右侧耳蜗 2.9%、食管 4.8%、左眼 4.1%、右眼 4.0%、左侧晶状体 2.2%、右侧晶状体 4.9%、下颌骨 2.5%、左侧视神经 11%、右侧视神经 6.1%、左侧腮腺 3.8%、右侧腮腺 5.9%和脊髓 3.0%。编辑的原因包括意外的患者定位、偏离标准临床实践以及剂量师偏好与自动勾画风格之间的分歧。剂量师的轮廓编辑实践存在统计学显著差异(P<.05),1 名剂量师对所有危及器官的编辑量都更多。为 5 名剂量师创建的 18%(27/150)的移动均值控制图表明,轮廓编辑量随时间减少,可能对应于自动化偏差。
该系统用于检测由软件错误、意外临床应用和自动化偏差引起的统计学显著编辑。提高检测自动生成轮廓编辑过程中系统性错误的能力将提高自动治疗计划工作流程的安全性。