• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用计算机断层扫描图像检测肺癌的可重现机器学习方法:算法开发与验证。

Reproducible Machine Learning Methods for Lung Cancer Detection Using Computed Tomography Images: Algorithm Development and Validation.

机构信息

Department of Biomedical Informatics, Harvard Medical School, Boston, MA, United States.

Department of Statistics, Harvard University, Cambridge, MA, United States.

出版信息

J Med Internet Res. 2020 Aug 5;22(8):e16709. doi: 10.2196/16709.

DOI:10.2196/16709
PMID:32755895
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC7439139/
Abstract

BACKGROUND

Chest computed tomography (CT) is crucial for the detection of lung cancer, and many automated CT evaluation methods have been proposed. Due to the divergent software dependencies of the reported approaches, the developed methods are rarely compared or reproduced.

OBJECTIVE

The goal of the research was to generate reproducible machine learning modules for lung cancer detection and compare the approaches and performances of the award-winning algorithms developed in the Kaggle Data Science Bowl.

METHODS

We obtained the source codes of all award-winning solutions of the Kaggle Data Science Bowl Challenge, where participants developed automated CT evaluation methods to detect lung cancer (training set n=1397, public test set n=198, final test set n=506). The performance of the algorithms was evaluated by the log-loss function, and the Spearman correlation coefficient of the performance in the public and final test sets was computed.

RESULTS

Most solutions implemented distinct image preprocessing, segmentation, and classification modules. Variants of U-Net, VGGNet, and residual net were commonly used in nodule segmentation, and transfer learning was used in most of the classification algorithms. Substantial performance variations in the public and final test sets were observed (Spearman correlation coefficient = .39 among the top 10 teams). To ensure the reproducibility of results, we generated a Docker container for each of the top solutions.

CONCLUSIONS

We compared the award-winning algorithms for lung cancer detection and generated reproducible Docker images for the top solutions. Although convolutional neural networks achieved decent accuracy, there is plenty of room for improvement regarding model generalizability.

摘要

背景

胸部计算机断层扫描(CT)对于肺癌的检测至关重要,并且已经提出了许多自动化 CT 评估方法。由于所报道的方法具有不同的软件依赖性,因此很少对开发的方法进行比较或再现。

目的

本研究的目的是生成用于肺癌检测的可重复使用的机器学习模块,并比较 Kaggle 数据科学碗竞赛中获奖算法的方法和性能。

方法

我们获得了 Kaggle 数据科学碗挑战赛所有获奖解决方案的源代码,参赛选手在其中开发了自动化 CT 评估方法来检测肺癌(训练集 n=1397,公共测试集 n=198,最终测试集 n=506)。通过对数损失函数评估算法的性能,并计算公共测试集和最终测试集性能的斯皮尔曼相关系数。

结果

大多数解决方案都实现了独特的图像预处理、分割和分类模块。U-Net、VGGNet 和残差网络的变体通常用于结节分割,并且大多数分类算法都使用了迁移学习。在公共测试集和最终测试集观察到了显著的性能差异(排名前 10 的团队之间的斯皮尔曼相关系数为.39)。为了确保结果的可重复性,我们为每个顶级解决方案生成了一个 Docker 容器。

结论

我们比较了用于肺癌检测的获奖算法,并为顶级解决方案生成了可重复的 Docker 映像。尽管卷积神经网络取得了相当高的准确性,但在模型的泛化能力方面仍有很大的改进空间。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a0a/7439139/ca9a8a6751e6/jmir_v22i8e16709_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a0a/7439139/6729502217b8/jmir_v22i8e16709_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a0a/7439139/a68ce551dd17/jmir_v22i8e16709_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a0a/7439139/71f12092e94e/jmir_v22i8e16709_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a0a/7439139/ca9a8a6751e6/jmir_v22i8e16709_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a0a/7439139/6729502217b8/jmir_v22i8e16709_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a0a/7439139/a68ce551dd17/jmir_v22i8e16709_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a0a/7439139/71f12092e94e/jmir_v22i8e16709_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/6a0a/7439139/ca9a8a6751e6/jmir_v22i8e16709_fig4.jpg

相似文献

1
Reproducible Machine Learning Methods for Lung Cancer Detection Using Computed Tomography Images: Algorithm Development and Validation.使用计算机断层扫描图像检测肺癌的可重现机器学习方法:算法开发与验证。
J Med Internet Res. 2020 Aug 5;22(8):e16709. doi: 10.2196/16709.
2
Improved lung nodule diagnosis accuracy using lung CT images with uncertain class.利用不确定类别的肺部 CT 图像提高肺结节诊断准确性。
Comput Methods Programs Biomed. 2018 Aug;162:197-209. doi: 10.1016/j.cmpb.2018.05.028. Epub 2018 May 18.
3
Segmentation of lung parenchyma in CT images using CNN trained with the clustering algorithm generated dataset.基于聚类算法生成数据集训练的 CNN 对 CT 图像中的肺实质进行分割。
Biomed Eng Online. 2019 Jan 3;18(1):2. doi: 10.1186/s12938-018-0619-9.
4
Microscopic handcrafted features selection from computed tomography scans for early stage lungs cancer diagnosis using hybrid classifiers.基于混合分类器的用于早期肺癌诊断的计算机断层扫描的微观手工特征选择。
Microsc Res Tech. 2022 Jun;85(6):2181-2191. doi: 10.1002/jemt.24075. Epub 2022 Feb 4.
5
[A deep learning-based lung nodule density classification and segmentation method and its effectiveness under different CT reconstruction algorithms].一种基于深度学习的肺结节密度分类与分割方法及其在不同CT重建算法下的有效性
Zhonghua Yi Xue Za Zhi. 2021 Feb 23;101(7):476-480. doi: 10.3760/cma.j.cn112137-20201123-03171.
6
Prediction of pathologic stage in non-small cell lung cancer using machine learning algorithm based on CT image feature analysis.基于 CT 图像特征分析的机器学习算法预测非小细胞肺癌病理分期。
BMC Cancer. 2019 May 17;19(1):464. doi: 10.1186/s12885-019-5646-9.
7
Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: The LUNA16 challenge.自动检测 CT 图像中肺结节的算法的验证、比较和组合:LUNA16 挑战赛。
Med Image Anal. 2017 Dec;42:1-13. doi: 10.1016/j.media.2017.06.015. Epub 2017 Jul 13.
8
Generation of virtual lung single-photon emission computed tomography/CT fusion images for functional avoidance radiotherapy planning using machine learning algorithms.使用机器学习算法生成用于功能回避放射治疗计划的虚拟肺单光子发射计算机断层扫描/计算机断层扫描融合图像。
J Med Imaging Radiat Oncol. 2019 Apr;63(2):229-235. doi: 10.1111/1754-9485.12868. Epub 2019 Mar 15.
9
Detection of pulmonary nodules based on a multiscale feature 3D U-Net convolutional neural network of transfer learning.基于迁移学习的多尺度特征 3D U-Net 卷积神经网络的肺结节检测。
PLoS One. 2020 Aug 26;15(8):e0235672. doi: 10.1371/journal.pone.0235672. eCollection 2020.
10
Deep Learning Algorithm for Reducing CT Slice Thickness: Effect on Reproducibility of Radiomic Features in Lung Cancer.深度学习算法减少 CT 切片厚度:对肺癌放射组学特征可重复性的影响。
Korean J Radiol. 2019 Oct;20(10):1431-1440. doi: 10.3348/kjr.2019.0212.

引用本文的文献

1
Random forest with preoperative core biopsy categories: a novel method for refining ultrasonic Breast Imaging Reporting and Data System evaluation.术前粗针活检分类的随机森林算法:一种优化超声乳腺影像报告和数据系统评估的新方法
Quant Imaging Med Surg. 2025 Jun 6;15(6):5362-5372. doi: 10.21037/qims-24-2070. Epub 2025 May 27.
2
Optimized deep learning approach for lung cancer detection using flying fox optimization and bidirectional generative adversarial networks.使用狐蝠优化算法和双向生成对抗网络的肺癌检测优化深度学习方法。
PeerJ Comput Sci. 2025 May 27;11:e2853. doi: 10.7717/peerj-cs.2853. eCollection 2025.
3

本文引用的文献

1
Classifying non-small cell lung cancer types and transcriptomic subtypes using convolutional neural networks.使用卷积神经网络对非小细胞肺癌进行分类和转录组亚型分析。
J Am Med Inform Assoc. 2020 May 1;27(5):757-769. doi: 10.1093/jamia/ocz230.
2
Artificial intelligence in healthcare.人工智能在医疗保健领域的应用。
Nat Biomed Eng. 2018 Oct;2(10):719-731. doi: 10.1038/s41551-018-0305-z. Epub 2018 Oct 10.
3
Framing the challenges of artificial intelligence in medicine.阐述医学领域中人工智能面临的挑战。
Landscape of 2D Deep Learning Segmentation Networks Applied to CT Scan from Lung Cancer Patients: A Systematic Review.
应用于肺癌患者CT扫描的二维深度学习分割网络全景:一项系统综述。
J Imaging Inform Med. 2025 Mar 4. doi: 10.1007/s10278-025-01458-x.
4
An early lung cancer diagnosis model for non-smokers incorporating ct imaging analysis and circulating genetically abnormal cells (CACs).一种结合CT成像分析和循环基因异常细胞(CACs)的非吸烟者早期肺癌诊断模型。
BMC Cancer. 2025 Jan 22;25(1):124. doi: 10.1186/s12885-024-13268-5.
5
Enhancing Cancerous Gene Selection and Classification for High-Dimensional Microarray Data Using a Novel Hybrid Filter and Differential Evolutionary Feature Selection.使用新型混合滤波器和差分进化特征选择增强高维微阵列数据的癌基因选择和分类
Cancers (Basel). 2024 Nov 22;16(23):3913. doi: 10.3390/cancers16233913.
6
Medical Artificial Intelligence and Human Values.医学人工智能与人类价值观
N Engl J Med. 2024 May 30;390(20):1895-1904. doi: 10.1056/NEJMra2214183.
7
Construction of a predictive model for bone metastasis from first primary lung adenocarcinoma within 3 cm based on machine learning algorithm: a retrospective study.基于机器学习算法构建的首个 3 cm 内原发性肺腺癌骨转移预测模型:一项回顾性研究。
PeerJ. 2024 Mar 14;12:e17098. doi: 10.7717/peerj.17098. eCollection 2024.
8
Why do probabilistic clinical models fail to transport between sites.为什么概率性临床模型无法在不同地点之间进行迁移?
NPJ Digit Med. 2024 Mar 1;7(1):53. doi: 10.1038/s41746-024-01037-4.
9
An International Non-Inferiority Study for the Benchmarking of AI for Routine Radiology Cases: Chest X-ray, Fluorography and Mammography.一项针对常规放射学病例(胸部X光、荧光透视和乳房X光检查)人工智能基准测试的国际非劣效性研究。
Healthcare (Basel). 2023 Jun 8;11(12):1684. doi: 10.3390/healthcare11121684.
10
Ultrasound-based prediction of preoperative core biopsy categories in solid breast tumor using machine learning.基于超声的机器学习预测实性乳腺肿瘤术前粗针活检类别
Quant Imaging Med Surg. 2023 Apr 1;13(4):2634-2646. doi: 10.21037/qims-22-877. Epub 2023 Mar 3.
BMJ Qual Saf. 2019 Mar;28(3):238-241. doi: 10.1136/bmjqs-2018-008551. Epub 2018 Oct 5.
4
Global cancer statistics 2018: GLOBOCAN estimates of incidence and mortality worldwide for 36 cancers in 185 countries.全球癌症统计数据 2018:GLOBOCAN 对全球 185 个国家/地区 36 种癌症的发病率和死亡率的估计。
CA Cancer J Clin. 2018 Nov;68(6):394-424. doi: 10.3322/caac.21492. Epub 2018 Sep 12.
5
Computer-aided diagnosis of lung nodule classification between benign nodule, primary lung cancer, and metastatic lung cancer at different image size using deep convolutional neural network with transfer learning.基于深度卷积神经网络的迁移学习在不同图像大小下对肺结节良恶性、原发性肺癌和转移性肺癌进行计算机辅助诊断。
PLoS One. 2018 Jul 27;13(7):e0200721. doi: 10.1371/journal.pone.0200721. eCollection 2018.
6
Convolutional neural network-based PSO for lung nodule false positive reduction on CT images.基于卷积神经网络的 PSO 算法在 CT 图像上降低肺结节的假阳性率。
Comput Methods Programs Biomed. 2018 Aug;162:109-118. doi: 10.1016/j.cmpb.2018.05.006. Epub 2018 May 9.
7
Association of Omics Features with Histopathology Patterns in Lung Adenocarcinoma.组学特征与肺腺癌组织病理学模式的关联。
Cell Syst. 2017 Dec 27;5(6):620-627.e3. doi: 10.1016/j.cels.2017.10.014. Epub 2017 Nov 15.
8
Omics AnalySIs System for PRecision Oncology (OASISPRO): a web-based omics analysis tool for clinical phenotype prediction.精准肿瘤学组学分析系统(OASISPRO):一种用于临床表型预测的基于网络的组学分析工具。
Bioinformatics. 2018 Jan 15;34(2):319-320. doi: 10.1093/bioinformatics/btx572.
9
Validation, comparison, and combination of algorithms for automatic detection of pulmonary nodules in computed tomography images: The LUNA16 challenge.自动检测 CT 图像中肺结节的算法的验证、比较和组合:LUNA16 挑战赛。
Med Image Anal. 2017 Dec;42:1-13. doi: 10.1016/j.media.2017.06.015. Epub 2017 Jul 13.
10
LUNGx Challenge for computerized lung nodule classification.用于计算机化肺结节分类的LUNGx挑战赛。
J Med Imaging (Bellingham). 2016 Oct;3(4):044506. doi: 10.1117/1.JMI.3.4.044506. Epub 2016 Dec 19.