Suppr超能文献

用于疾病分类的机器学习和特征选择方法及其在肺癌筛查图像数据中的应用

Machine Learning and Feature Selection Methods for Disease Classification With Application to Lung Cancer Screening Image Data.

作者信息

Delzell Darcie A P, Magnuson Sara, Peter Tabitha, Smith Michelle, Smith Brian J

机构信息

Department of Mathematics and Computer Science, Wheaton College, Wheaton, IL, United States.

Department of Biostatistics, University of Iowa, Iowa City, IA, United States.

出版信息

Front Oncol. 2019 Dec 11;9:1393. doi: 10.3389/fonc.2019.01393. eCollection 2019.

Abstract

As awareness of the habits and risks associated with lung cancer has increased, so has the interest in promoting and improving upon lung cancer screening procedures. Recent research demonstrates the benefits of lung cancer screening; the National Lung Screening Trial (NLST) found as its primary result that preventative screening significantly decreases the death rate for patients battling lung cancer. However, it was also noted that the false positive rate was very high (>94%).In this work, we investigated the ability of various machine learning classifiers to accurately predict lung cancer nodule status while also considering the associated false positive rate. We utilized 416 quantitative imaging biomarkers taken from CT scans of lung nodules from 200 patients, where the nodules had been verified as cancerous or benign. These imaging biomarkers were created from both nodule and parenchymal tissue. A variety of linear, nonlinear, and ensemble predictive classifying models, along with several feature selection methods, were used to classify the binary outcome of malignant or benign status. Elastic net and support vector machine, combined with either a linear combination or correlation feature selection method, were some of the best-performing classifiers (average cross-validation AUC near 0.72 for these models), while random forest and bagged trees were the worst performing classifiers (AUC near 0.60). For the best performing models, the false positive rate was near 30%, notably lower than that reported in the NLST.The use of radiomic biomarkers with machine learning methods are a promising diagnostic tool for tumor classification. The have the potential to provide good classification and simultaneously reduce the false positive rate.

摘要

随着人们对肺癌相关习惯和风险的认识不断提高,对推广和改进肺癌筛查程序的兴趣也与日俱增。近期研究表明了肺癌筛查的益处;国家肺癌筛查试验(NLST)的主要结果发现,预防性筛查显著降低了肺癌患者的死亡率。然而,也有人指出假阳性率非常高(>94%)。在这项工作中,我们研究了各种机器学习分类器准确预测肺癌结节状态的能力,同时也考虑了相关的假阳性率。我们利用了从200名患者的肺结节CT扫描中获取的416个定量影像生物标志物,其中这些结节已被证实为癌性或良性。这些影像生物标志物是从结节和实质组织中创建的。使用了各种线性、非线性和集成预测分类模型,以及几种特征选择方法,对恶性或良性状态的二元结果进行分类。弹性网络和支持向量机,结合线性组合或相关特征选择方法,是一些表现最佳的分类器(这些模型的平均交叉验证AUC接近0.72),而随机森林和装袋树是表现最差的分类器(AUC接近0.6)。对于表现最佳的模型,假阳性率接近30%,明显低于NLST报告的水平。将放射组学生物标志物与机器学习方法结合使用是一种有前景的肿瘤分类诊断工具。它们有可能提供良好的分类,同时降低假阳性率。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e6f3/6917601/0e5f3f308fc7/fonc-09-01393-g0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验