Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States.
Department of Statistics, Harvard University, Cambridge, MA, United States.
J Med Internet Res. 2020 Aug 5;22(8):e16709. doi: 10.2196/16709.
Chest computed tomography (CT) is crucial for the detection of lung cancer, and many automated CT evaluation methods have been proposed. Due to the divergent software dependencies of the reported approaches, the developed methods are rarely compared or reproduced.
The goal of the research was to generate reproducible machine learning modules for lung cancer detection and compare the approaches and performances of the award-winning algorithms developed in the Kaggle Data Science Bowl.
We obtained the source codes of all award-winning solutions of the Kaggle Data Science Bowl Challenge, where participants developed automated CT evaluation methods to detect lung cancer (training set n=1397, public test set n=198, final test set n=506). The performance of the algorithms was evaluated by the log-loss function, and the Spearman correlation coefficient of the performance in the public and final test sets was computed.
Most solutions implemented distinct image preprocessing, segmentation, and classification modules. Variants of U-Net, VGGNet, and residual net were commonly used in nodule segmentation, and transfer learning was used in most of the classification algorithms. Substantial performance variations in the public and final test sets were observed (Spearman correlation coefficient = .39 among the top 10 teams). To ensure the reproducibility of results, we generated a Docker container for each of the top solutions.
We compared the award-winning algorithms for lung cancer detection and generated reproducible Docker images for the top solutions. Although convolutional neural networks achieved decent accuracy, there is plenty of room for improvement regarding model generalizability.
胸部计算机断层扫描(CT)对于肺癌的检测至关重要,并且已经提出了许多自动化 CT 评估方法。由于所报道的方法具有不同的软件依赖性,因此很少对开发的方法进行比较或再现。
本研究的目的是生成用于肺癌检测的可重复使用的机器学习模块,并比较 Kaggle 数据科学碗竞赛中获奖算法的方法和性能。
我们获得了 Kaggle 数据科学碗挑战赛所有获奖解决方案的源代码,参赛选手在其中开发了自动化 CT 评估方法来检测肺癌(训练集 n=1397,公共测试集 n=198,最终测试集 n=506)。通过对数损失函数评估算法的性能,并计算公共测试集和最终测试集性能的斯皮尔曼相关系数。
大多数解决方案都实现了独特的图像预处理、分割和分类模块。U-Net、VGGNet 和残差网络的变体通常用于结节分割,并且大多数分类算法都使用了迁移学习。在公共测试集和最终测试集观察到了显著的性能差异(排名前 10 的团队之间的斯皮尔曼相关系数为.39)。为了确保结果的可重复性,我们为每个顶级解决方案生成了一个 Docker 容器。
我们比较了用于肺癌检测的获奖算法,并为顶级解决方案生成了可重复的 Docker 映像。尽管卷积神经网络取得了相当高的准确性,但在模型的泛化能力方面仍有很大的改进空间。