基于深度学习的自动勾画系统的自动勾画 QA 方法。
Automatic contouring QA method using a deep learning-based autocontouring system.
机构信息
The University of Texas Graduate School of Biomedical Sciences at Houston, Houston, Texas, USA.
Department of Radiation Physics, Division of Radiation Oncology, The University of Texas MD Anderson Cancer Center, Houston, Texas, USA.
出版信息
J Appl Clin Med Phys. 2022 Aug;23(8):e13647. doi: 10.1002/acm2.13647. Epub 2022 May 17.
PURPOSE
To determine the most accurate similarity metric when using an independent system to verify automatically generated contours.
METHODS
A reference autocontouring system (primary system to create clinical contours) and a verification autocontouring system (secondary system to test the primary contours) were used to generate a pair of 6 female pelvic structures (UteroCervix [uterus + cervix], CTVn [nodal clinical target volume (CTV)], PAN [para-aortic lymph nodes], bladder, rectum, and kidneys) on 49 CT scans from our institution and 38 from other institutions. Additionally, clinically acceptable and unacceptable contours were manually generated using the 49 internal CT scans. Eleven similarity metrics (volumetric Dice similarity coefficient (DSC), Hausdorff distance, 95% Hausdorff distance, mean surface distance, and surface DSC with tolerances from 1 to 10 mm) were calculated between the reference and the verification autocontours, and between the manually generated and the verification autocontours. A support vector machine (SVM) was used to determine the threshold that separates clinically acceptable and unacceptable contours for each structure. The 11 metrics were investigated individually and in certain combinations. Linear, radial basis function, sigmoid, and polynomial kernels were tested using the combinations of metrics as inputs for the SVM.
RESULTS
The highest contouring error detection accuracies were 0.91 for the UteroCervix, 0.90 for the CTVn, 0.89 for the PAN, 0.92 for the bladder, 0.95 for the rectum, and 0.97 for the kidneys and were achieved using surface DSCs with a thickness of 1, 2, or 3 mm. The linear kernel was the most accurate and consistent when a combination of metrics was used as an input for the SVM. However, the best model accuracy from the combinations of metrics was not better than the best model accuracy from a surface DSC as an input.
CONCLUSIONS
We distinguished clinically acceptable contours from clinically unacceptable contours with an accuracy higher than 0.9 for the targets and critical structures in patients with cervical cancer; the most accurate similarity metric was surface DSC with a thickness of 1, 2, or 3 mm.
目的
确定使用独立系统验证自动生成轮廓时最准确的相似性度量标准。
方法
使用参考自动勾画系统(主要用于创建临床轮廓)和验证自动勾画系统(次要系统用于测试主要轮廓),对来自我们机构的 49 例 CT 扫描和来自其他机构的 38 例 CT 扫描的 6 例女性盆腔结构(子宫颈[子宫+宫颈]、CTVn[淋巴结临床靶区(CTV)]、PAN[腹主动脉旁淋巴结]、膀胱、直肠和肾脏)进行勾画。此外,还使用 49 例内部 CT 扫描手动生成了可接受和不可接受的轮廓。计算了参考与验证自动勾画之间、手动生成与验证自动勾画之间的 11 种相似性度量标准(体积 Dice 相似系数(DSC)、Hausdorff 距离、95% Hausdorff 距离、平均表面距离和 1 至 10mm 容差的表面 DSC)。使用支持向量机(SVM)确定每个结构可接受和不可接受轮廓的分离阈值。分别研究了 11 种度量标准,并进行了某些组合研究。使用 SVM 将度量标准的组合作为输入,测试了线性、径向基函数、Sigmoid 和多项式核。
结果
对于子宫颈、CTVn、PAN、膀胱、直肠和肾脏,轮廓误差检测准确率最高分别为 0.91、0.90、0.89、0.92、0.95 和 0.97,使用 1、2 或 3mm 厚的表面 DSC 实现。当将度量标准的组合用作 SVM 的输入时,线性核是最准确和一致的。然而,来自度量标准组合的最佳模型准确性并不优于表面 DSC 作为输入的最佳模型准确性。
结论
我们使用基于表面 DSC(厚度为 1、2 或 3mm)的相似性度量标准,以高于 0.9 的准确率区分了宫颈癌患者靶区和关键结构的可接受轮廓和不可接受轮廓。