Gupta Sunil, Tran Truyen, Luo Wei, Phung Dinh, Kennedy Richard Lee, Broad Adam, Campbell David, Kipp David, Singh Madhu, Khasraw Mustafa, Matheson Leigh, Ashley David M, Venkatesh Svetha
Centre for Pattern Recognition and Data Analytics, Deakin University, Geelong, Victoria, Australia.
BMJ Open. 2014 Mar 17;4(3):e004007. doi: 10.1136/bmjopen-2013-004007.
Using the prediction of cancer outcome as a model, we have tested the hypothesis that through analysing routinely collected digital data contained in an electronic administrative record (EAR), using machine-learning techniques, we could enhance conventional methods in predicting clinical outcomes.
A regional cancer centre in Australia.
Disease-specific data from a purpose-built cancer registry (Evaluation of Cancer Outcomes (ECO)) from 869 patients were used to predict survival at 6, 12 and 24 months. The model was validated with data from a further 94 patients, and results compared to the assessment of five specialist oncologists. Machine-learning prediction using ECO data was compared with that using EAR and a model combining ECO and EAR data.
Survival prediction accuracy in terms of the area under the receiver operating characteristic curve (AUC).
The ECO model yielded AUCs of 0.87 (95% CI 0.848 to 0.890) at 6 months, 0.796 (95% CI 0.774 to 0.823) at 12 months and 0.764 (95% CI 0.737 to 0.789) at 24 months. Each was slightly better than the performance of the clinician panel. The model performed consistently across a range of cancers, including rare cancers. Combining ECO and EAR data yielded better prediction than the ECO-based model (AUCs ranging from 0.757 to 0.997 for 6 months, AUCs from 0.689 to 0.988 for 12 months and AUCs from 0.713 to 0.973 for 24 months). The best prediction was for genitourinary, head and neck, lung, skin, and upper gastrointestinal tumours.
Machine learning applied to information from a disease-specific (cancer) database and the EAR can be used to predict clinical outcomes. Importantly, the approach described made use of digital data that is already routinely collected but underexploited by clinical health systems.
以癌症预后预测作为模型,我们检验了以下假设:通过运用机器学习技术分析电子行政记录(EAR)中常规收集的数字数据,我们可以改进预测临床结局的传统方法。
澳大利亚的一个地区癌症中心。
来自一个专门构建的癌症登记处(癌症结局评估(ECO))的869例患者的疾病特异性数据用于预测6个月、12个月和24个月时的生存率。该模型用另外94例患者的数据进行验证,并将结果与5名专科肿瘤学家的评估结果进行比较。将使用ECO数据的机器学习预测与使用EAR数据的预测以及结合ECO和EAR数据的模型的预测进行比较。
根据受试者工作特征曲线下面积(AUC)衡量的生存预测准确性。
ECO模型在6个月时的AUC为0.87(95%CI 0.848至0.890),12个月时为0.796(95%CI 0.774至0.823),24个月时为0.764(95%CI 0.737至0.789)。每个结果都略优于临床医生小组的表现。该模型在一系列癌症(包括罕见癌症)中表现一致。结合ECO和EAR数据的预测比基于ECO的模型更好(6个月时的AUC范围为0.757至0.997,12个月时为0.689至0.988,24个月时为0.713至0.973)。预测效果最佳的是泌尿生殖系统、头颈部、肺部、皮肤和上消化道肿瘤。
应用于疾病特异性(癌症)数据库和EAR信息的机器学习可用于预测临床结局。重要的是,所描述的方法利用了临床卫生系统已经常规收集但未充分利用的数字数据。