Cherezov Dmitry, Hawkins Samuel, Goldgof Dmitry, Hall Lawrence, Balagurunathan Yoganand, Gillies Robert J, Schabath Matthew B
Department of Computer Sciences and Engineering, University of South Florida Tampa, Florida.
Departments of Cancer Imaging and Metabolism, H. Lee Moffitt Cancer Center and Research Institute Tampa,Florida.
Conf Proc IEEE Int Conf Syst Man Cybern. 2016 Oct;2016:001939-1944. doi: 10.1109/SMC.2016.7844523. Epub 2017 Feb 9.
Computed tomography (CT) is widely used during diagnosis and treatment of Non-Small Cell Lung Cancer (NSCLC). Current computer-aided diagnosis (CAD) models, designed for the classification of malignant and benign nodules, use image features, selected by feature selectors, for making a decision. In this paper, we investigate automated selection of different image features informed by different nodule size ranges to increase the overall accuracy of the classification. The NLST dataset is one of the largest available datasets on CT screening for NSCLC. We used 261 cases as a training dataset and 237 cases as a test dataset. The nodule size, which may indicate biological variability, can vary substantially. For example, in the training set, there are nodules with a diameter of a couple millimeters up to a couple dozen millimeters. The premise is that benign and malignant nodules have different radiomic quantitative descriptors related to size. After splitting training and testing datasets into three subsets based on the longest nodule diameter (LD) parameter accuracy was improved from 74.68% to 81.01% and the AUC improved from 0.69 to 0.79. We show that if AUC is the main factor in choosing parameters then accuracy improved from 72.57% to 77.5% and AUC improved from 0.78 to 0.82. Additionally, we show the impact of an oversampling technique for the minority cancer class. In some particular cases from 0.82 to 0.87.
计算机断层扫描(CT)在非小细胞肺癌(NSCLC)的诊断和治疗中被广泛应用。当前用于恶性和良性结节分类的计算机辅助诊断(CAD)模型,利用特征选择器选择的图像特征来做出决策。在本文中,我们研究根据不同结节大小范围自动选择不同图像特征,以提高分类的整体准确性。NLST数据集是关于NSCLC CT筛查的最大可用数据集之一。我们将261例作为训练数据集,237例作为测试数据集。结节大小可能表明生物学变异性,其差异可能很大。例如,在训练集中,有直径从几毫米到几十毫米的结节。前提是良性和恶性结节具有与大小相关的不同放射组学定量描述符。基于最长结节直径(LD)参数将训练和测试数据集分为三个子集后,准确率从74.68%提高到81.01%,曲线下面积(AUC)从0.69提高到0.79。我们表明,如果AUC是选择参数的主要因素,那么准确率从72.57%提高到77.5%,AUC从0.78提高到0.82。此外,我们展示了过采样技术对少数癌症类别的影响。在某些特定情况下,从0.82提高到0.87。