Earnest Arul, Tesema Getayeneh Antehunegn, Stirling Robert G
School of Public Health and Preventive Medicine, Monash University, Melbourne, VIC 3004, Australia.
Department of Respiratory Medicine, Alfred Health, Melbourne, VIC 3004, Australia.
Healthcare (Basel). 2023 Oct 18;11(20):2756. doi: 10.3390/healthcare11202756.
Delays in the assessment, management, and treatment of lung cancer patients may adversely impact prognosis and survival. This study is the first to use machine learning techniques to predict the quality and timeliness of care among lung cancer patients, utilising data from the Victorian Lung Cancer Registry (VLCR) between 2011 and 2022, in Victoria, Australia. Predictor variables included demographic, clinical, hospital, and geographical socio-economic indices. Machine learning methods such as random forests, k-nearest neighbour, neural networks, and support vector machines were implemented and evaluated using 20% out-of-sample cross validations via the area under the curve (AUC). Optimal model parameters were selected based on 10-fold cross validation. There were 11,602 patients included in the analysis. Evaluated quality indicators included, primarily, overall proportion achieving "time from referral date to diagnosis date ≤ 28 days" and proportion achieving "time from diagnosis date to first treatment date (any intent) ≤ 14 days". Results showed that the support vector machine learning methods performed well, followed by nearest neighbour, based on out-of-sample AUCs of 0.89 (in-sample = 0.99) and 0.85 (in-sample = 0.99) for the first indicator, respectively. These models can be implemented in the registry databases to help healthcare workers identify patients who may not meet these indicators prospectively and enable timely interventions.
肺癌患者评估、管理和治疗的延迟可能会对预后和生存产生不利影响。本研究首次使用机器学习技术,利用澳大利亚维多利亚州2011年至2022年维多利亚肺癌登记处(VLCR)的数据,预测肺癌患者护理的质量和及时性。预测变量包括人口统计学、临床、医院和地理社会经济指标。实施了随机森林、k近邻、神经网络和支持向量机等机器学习方法,并通过曲线下面积(AUC)使用20%的样本外交叉验证进行评估。基于10折交叉验证选择最佳模型参数。分析纳入了11602名患者。评估的质量指标主要包括达到“从转诊日期到诊断日期≤28天”的总体比例以及达到“从诊断日期到首次治疗日期(任何意图)≤14天”的比例。结果表明,基于第一个指标的样本外AUC分别为0.89(样本内=0.99)和0.85(样本内=0.99),支持向量机学习方法表现良好,其次是最近邻方法。这些模型可以在登记数据库中实施,以帮助医护人员前瞻性地识别可能不符合这些指标的患者,并及时进行干预。