Department of Biostatistics and Informatics, Colorado School of Public Health, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America.
Department of Pulmonary Sciences and Critical Care Medicine, School of Medicine, University of Colorado Anschutz Medical Campus, Aurora, CO, United States of America.
PLoS One. 2020 Apr 14;15(4):e0231468. doi: 10.1371/journal.pone.0231468. eCollection 2020.
We present a case study for implementing a machine learning algorithm with an incremental value framework in the domain of lung cancer research. Machine learning methods have often been shown to be competitive with prediction models in some domains; however, implementation of these methods is in early development. Often these methods are only directly compared to existing methods; here we present a framework for assessing the value of a machine learning model by assessing the incremental value. We developed a machine learning model to identify and classify lung nodules and assessed the incremental value added to existing risk prediction models. Multiple external datasets were used for validation. We found that our image model, trained on a dataset from The Cancer Imaging Archive (TCIA), improves upon existing models that are restricted to patient characteristics, but it was inconclusive about whether it improves on models that consider nodule features. Another interesting finding is the variable performance on different datasets, suggesting population generalization with machine learning models may be more challenging than is often considered.
我们提出了一个案例研究,即在肺癌研究领域中实施具有增量价值框架的机器学习算法。机器学习方法在某些领域中经常被证明具有竞争力; 然而,这些方法的实施仍处于早期发展阶段。通常,这些方法仅与现有方法直接进行比较; 在这里,我们提出了一种通过评估增量价值来评估机器学习模型价值的框架。我们开发了一种用于识别和分类肺结节的机器学习模型,并评估了对现有风险预测模型的增量附加值。使用多个外部数据集进行验证。我们发现,我们的图像模型,基于来自癌症成像档案(TCIA)的数据集进行训练,改进了仅限于患者特征的现有模型,但尚不确定它是否改进了考虑结节特征的模型。另一个有趣的发现是在不同数据集上的性能差异很大,这表明机器学习模型的人群泛化可能比人们通常认为的更具挑战性。