Department of Radiation Oncology, Peking University Third Hospital, Beijing, China.
Brainnetome Center & National Laboratory of Pattern Recognition, Institute of Automation, Chinese Academy of Sciences, Beijing, China; CAS Center for Excellence in Brain Science and Intelligence Technology, Institute of Automation, Chinese Academy of Sciences, Beijing, China.
Int J Radiat Oncol Biol Phys. 2019 Nov 15;105(4):893-902. doi: 10.1016/j.ijrobp.2019.07.049. Epub 2019 Aug 1.
To assess the accuracy of machine learning to predict and classify quality assurance (QA) results for volumetric modulated arc therapy (VMAT) plans.
Three hundred three VMAT plans, including 176 gynecologic cancer and 127 head and neck cancer plans, were chosen in this study. Fifty-four complexity metrics were extracted from the QA plans and considered as inputs. Patient-specific QA was performed, and gamma passing rates (GPRs) were used as outputs. One Poisson lasso (PL) regression model was developed, aiming to predict individual GPR, and 1 random forest (RF) classification model was developed to classify QA results as "pass" or "fail." Both technical validation (TV) and clinical validation (CV) were used to evaluate the model reliability. GPR prediction accuracy of PL and classification performance of PL and RF were evaluated.
In TV, the mean prediction error of PL was 1.81%, 2.39%, and 4.18% at 3%/3 mm, 3%/2 mm, and 2%/2 mm, respectively. No significant differences in prediction errors between TV and CV were observed. In QA results classification, PL had a higher specificity (accurately identifying plans that can pass QA), whereas RF had a higher sensitivity (accurately identifying plans that may fail QA). By using 90% as the action limit at a 3%/2 mm criterion, the specificity of PL and RF was 97.5% and 87.7% in TV and 100% and 71.4% in CV, respectively. The sensitivity of PL and RF was 31.6% and 100% in TV and 33.3% and 100% in CV, respectively. With 100% sensitivity, the QA workload of 81.2% of plans in TV and 62.5% of plans in CV could be reduced by RF.
The PL model could accurately predict GPR for most VMAT plans. The RF model with 100% sensitivity was preferred for QA results classification. Machine learning can be a useful tool to assist VMAT QA and reduce QA workload.
评估机器学习预测和分类容积调强弧形治疗(VMAT)计划质量保证(QA)结果的准确性。
本研究选择了 303 例 VMAT 计划,包括 176 例妇科癌症和 127 例头颈部癌症计划。从 QA 计划中提取了 54 个复杂性指标,并将其作为输入。进行了患者特异性 QA,使用伽马通过率(GPR)作为输出。开发了一个泊松套索(PL)回归模型,旨在预测个体 GPR,并开发了一个随机森林(RF)分类模型,将 QA 结果分类为“通过”或“失败”。均采用技术验证(TV)和临床验证(CV)来评估模型可靠性。评估了 PL 的 GPR 预测准确性和 PL 和 RF 的分类性能。
在 TV 中,PL 的平均预测误差分别为 3%/3mm、3%/2mm 和 2%/2mm 时的 1.81%、2.39%和 4.18%。在 TV 和 CV 中,预测误差之间没有显著差异。在 QA 结果分类中,PL 具有更高的特异性(准确识别可以通过 QA 的计划),而 RF 具有更高的敏感性(准确识别可能无法通过 QA 的计划)。在 3%/2mm 标准下使用 90%作为行动限制时,PL 和 RF 在 TV 中的特异性分别为 97.5%和 87.7%,在 CV 中的特异性分别为 100%和 71.4%。PL 和 RF 在 TV 中的敏感性分别为 31.6%和 100%,在 CV 中的敏感性分别为 33.3%和 100%。在 100%的敏感性下,RF 可将 TV 中 81.2%的计划和 CV 中 62.5%的计划的 QA 工作量减少 100%。
PL 模型可以准确预测大多数 VMAT 计划的 GPR。具有 100%敏感性的 RF 模型更适合 QA 结果分类。机器学习可以成为辅助 VMAT QA 和减少 QA 工作量的有用工具。