Department of Radiology, Stony Brook University, Stony Brook, NY, 11794 USA.
Northeastern University, Shenyang, Liaoning, 110819 PR China.
Comput Med Imaging Graph. 2019 Oct;77:101645. doi: 10.1016/j.compmedimag.2019.101645. Epub 2019 Aug 11.
Cancer has been one of the most threatening diseases to human health. There have been many efforts devoted to the advancement of radiology and transformative tools (e.g. non-invasive computed tomographic or CT imaging) to detect cancer in early stages. One of the major goals is to identify malignant from benign lesions. In recent years, machine deep learning (DL), e.g. convolutional neural network (CNN), has shown encouraging classification performance on medical images. However, DL algorithms always need large datasets with ground truth. Yet in the medical imaging field, especially for cancer imaging, it is difficult to collect such large volume of images with pathological information. Therefore, strategies are needed to learn effectively from small datasets via CNN models. To forward that goal, this paper explores two CNN models by focusing extensively on expansion of training samples from two small pathologically proven datasets (colorectal polyp dataset and lung nodule dataset) and then differentiating malignant from benign lesions. Experimental outcomes indicate that even in very small datasets of less than 70 subjects, malignance can be successfully differentiated from benign via the proposed CNN models, the average AUCs (area under the receiver operating curve) of differentiating colorectal polyps and pulmonary nodules are 0.86 and 0.71, respectively. Our experiments further demonstrate that for these two small datasets, instead of only studying the original raw CT images, feeding additional image features, such as the local binary pattern of the lesions, into the CNN models can significantly improve classification performance. In addition, we find that our explored voxel level CNN model has better performance when facing the small and unbalanced datasets.
癌症一直是对人类健康最具威胁的疾病之一。人们一直在努力推进放射学和变革性工具(例如,非侵入性计算机断层扫描或 CT 成像)的发展,以便在早期阶段检测癌症。其中一个主要目标是识别恶性和良性病变。近年来,机器深度学习(DL),例如卷积神经网络(CNN),在医学图像分类方面表现出了令人鼓舞的性能。然而,DL 算法通常需要具有真实数据的大型数据集。然而,在医学成像领域,特别是癌症成像,很难收集到具有病理信息的如此大量的图像。因此,需要通过 CNN 模型从小型数据集学习的策略。为了实现这一目标,本文通过广泛关注从两个小型病理证实的数据集(结肠息肉数据集和肺结节数据集)扩展训练样本,然后区分良恶性病变,探索了两种 CNN 模型。实验结果表明,即使在小于 70 个受试者的非常小的数据集,也可以通过所提出的 CNN 模型成功地从良性病变中区分恶性病变,区分结肠息肉和肺结节的平均 AUC(接收器操作特征曲线下的面积)分别为 0.86 和 0.71。我们的实验进一步表明,对于这两个小型数据集,除了研究原始的原始 CT 图像外,将附加的图像特征(例如病变的局部二进制模式)输入 CNN 模型可以显著提高分类性能。此外,我们发现我们探索的体素级 CNN 模型在面对小而不平衡的数据集时具有更好的性能。