1 Department of Otolaryngology-Head and Neck Surgery, School of Medicine, University of Kansas, Kansas City, Kansas, USA.
2 School of Medicine, University of Kansas, Kansas City, Kansas, USA.
Otolaryngol Head Neck Surg. 2019 Jun;160(6):1058-1064. doi: 10.1177/0194599818823200. Epub 2019 Jan 29.
To apply a novel methodology with machine learning (ML) to a large national cancer registry to help identify patients who are high risk for delayed adjuvant radiation.
Observational cohort study.
National Cancer Database (NCDB).
A total of 76,573 patients were identified from the NCDB who had invasive head and neck cancer and underwent surgery, followed by radiation. The model was constructed from 80% of the patient data and subsequently evaluated and scored with the remaining 20%. Permutation feature importance analysis was used to understand the weighted model construction.
A total of 76,573 patients met inclusion and exclusion criteria. Our ML model was able to predict whether patients would start adjuvant therapy beyond 50 days after surgery with an overall accuracy of 64.41% and a precision of 58.5%. The 2 most important variables used to build the model were treating facility and urban versus rural demographics.
Statistics can provide inferences within an overall system, while ML is a novel methodology that can make predictions. We can identify patients who are "high risk" for delayed radiation using information from >75,000 patient experiences, which has the potential for a direct impact on clinical care. Our inability to achieve greater accuracy is due to limitations of the data captured by the NCDB, and we need to continue to identify new variables that are correlated with delayed radiation therapy. ML will prove to be a valuable clinical tool in years to come, but its utility is limited by available data.
应用机器学习(ML)的新方法对大型国家癌症登记处进行分析,以帮助识别延迟接受辅助放疗的高危患者。
观察性队列研究。
国家癌症数据库(NCDB)。
从 NCDB 中确定了 76573 例患有侵袭性头颈部癌症并接受手术,随后接受放疗的患者。该模型由 80%的患者数据构建,然后使用其余 20%的数据进行评估和评分。通过置换特征重要性分析来了解加权模型构建。
共有 76573 例患者符合纳入和排除标准。我们的 ML 模型能够预测患者是否会在手术后 50 天以上开始辅助治疗,总体准确率为 64.41%,精度为 58.5%。用于构建模型的 2 个最重要变量是治疗机构和城市与农村人口统计学。
统计学可以提供整个系统的推断,而 ML 是一种可以进行预测的新方法。我们可以使用超过 75000 名患者的经验信息来识别“高风险”延迟放疗的患者,这有可能直接影响临床护理。我们无法提高准确性是由于 NCDB 捕获的数据有限,我们需要继续确定与延迟放疗相关的新变量。在未来几年,ML 将被证明是一种有价值的临床工具,但它的实用性受到可用数据的限制。