Centre for Advanced Modelling and Geospatial Information Systems (CAMGIS), Faculty of Engineering and IT, University of Technology Sydney, Sydney, NSW 2007, Australia.
IBM Australia Ltd., Sydney, NSW 2000, Australia.
Sensors (Basel). 2021 Oct 7;21(19):6655. doi: 10.3390/s21196655.
Lung cancer is the leading cause of cancer death and morbidity worldwide. Many studies have shown machine learning models to be effective in detecting lung nodules from chest X-ray images. However, these techniques have yet to be embraced by the medical community due to several practical, ethical, and regulatory constraints stemming from the "black-box" nature of deep learning models. Additionally, most lung nodules visible on chest X-rays are benign; therefore, the narrow task of computer vision-based lung nodule detection cannot be equated to automated lung cancer detection. Addressing both concerns, this study introduces a novel hybrid deep learning and decision tree-based computer vision model, which presents lung cancer malignancy predictions as interpretable decision trees. The deep learning component of this process is trained using a large publicly available dataset on pathological biomarkers associated with lung cancer. These models are then used to inference biomarker scores for chest X-ray images from two independent data sets, for which malignancy metadata is available. Next, multi-variate predictive models were mined by fitting shallow decision trees to the malignancy stratified datasets and interrogating a range of metrics to determine the best model. The best decision tree model achieved sensitivity and specificity of 86.7% and 80.0%, respectively, with a positive predictive value of 92.9%. Decision trees mined using this method may be considered as a starting point for refinement into clinically useful multi-variate lung cancer malignancy models for implementation as a workflow augmentation tool to improve the efficiency of human radiologists.
肺癌是全球癌症死亡和发病的主要原因。许多研究表明,机器学习模型在从胸部 X 光图像中检测肺结节方面非常有效。然而,由于深度学习模型的“黑盒”性质所带来的一些实际、伦理和监管方面的限制,这些技术尚未被医学界所接受。此外,胸部 X 光上可见的大多数肺结节都是良性的;因此,基于计算机视觉的肺结节检测的狭隘任务不能等同于自动肺癌检测。针对这两个问题,本研究提出了一种新颖的基于混合深度学习和决策树的计算机视觉模型,该模型将肺癌恶性程度预测表示为可解释的决策树。这个过程的深度学习部分是使用与肺癌相关的病理生物标志物的大型公共数据集进行训练的。然后,使用这些模型对来自两个独立数据集的胸部 X 光图像进行推断生物标志物评分,这些数据集都有恶性肿瘤元数据。接下来,通过对分层数据集拟合浅层决策树,并对一系列指标进行查询,挖掘多变量预测模型,以确定最佳模型。最佳决策树模型的敏感性和特异性分别为 86.7%和 80.0%,阳性预测值为 92.9%。使用这种方法挖掘的决策树可以被认为是一个起点,用于进一步细化为临床有用的多变量肺癌恶性肿瘤模型,作为工作流程增强工具,以提高人类放射科医生的工作效率。