Ragab Dina A, Sharkas Maha, Marshall Stephen, Ren Jinchang
Electronics and Communications Engineering Department, Arab Academy for Science, Technology, and Maritime Transport (AASTMT), Alexandria, Egypt.
Electronic & Electrical Engineering Department, University of Strathclyde, Glasgow, United Kingdom.
PeerJ. 2019 Jan 28;7:e6201. doi: 10.7717/peerj.6201. eCollection 2019.
It is important to detect breast cancer as early as possible. In this manuscript, a new methodology for classifying breast cancer using deep learning and some segmentation techniques are introduced. A new computer aided detection (CAD) system is proposed for classifying benign and malignant mass tumors in breast mammography images. In this CAD system, two segmentation approaches are used. The first approach involves determining the region of interest (ROI) manually, while the second approach uses the technique of threshold and region based. The deep convolutional neural network (DCNN) is used for feature extraction. A well-known DCNN architecture named AlexNet is used and is fine-tuned to classify two classes instead of 1,000 classes. The last fully connected (fc) layer is connected to the support vector machine (SVM) classifier to obtain better accuracy. The results are obtained using the following publicly available datasets (1) the digital database for screening mammography (DDSM); and (2) the Curated Breast Imaging Subset of DDSM (CBIS-DDSM). Training on a large number of data gives high accuracy rate. Nevertheless, the biomedical datasets contain a relatively small number of samples due to limited patient volume. Accordingly, data augmentation is a method for increasing the size of the input data by generating new data from the original input data. There are many forms for the data augmentation; the one used here is the rotation. The accuracy of the new-trained DCNN architecture is 71.01% when cropping the ROI manually from the mammogram. The highest area under the curve (AUC) achieved was 0.88 (88%) for the samples obtained from both segmentation techniques. Moreover, when using the samples obtained from the CBIS-DDSM, the accuracy of the DCNN is increased to 73.6%. Consequently, the SVM accuracy becomes 87.2% with an AUC equaling to 0.94 (94%). This is the highest AUC value compared to previous work using the same conditions.
尽早检测出乳腺癌很重要。在本手稿中,介绍了一种使用深度学习和一些分割技术对乳腺癌进行分类的新方法。提出了一种新的计算机辅助检测(CAD)系统,用于对乳腺钼靶图像中的良性和恶性肿块肿瘤进行分类。在这个CAD系统中,使用了两种分割方法。第一种方法是手动确定感兴趣区域(ROI),而第二种方法使用基于阈值和区域的技术。深度卷积神经网络(DCNN)用于特征提取。使用了一种名为AlexNet的著名DCNN架构,并对其进行微调以对两类而不是1000类进行分类。最后一个全连接(fc)层连接到支持向量机(SVM)分类器以获得更高的准确率。使用以下公开可用数据集获得结果:(1)乳腺钼靶筛查数字数据库(DDSM);以及(2)DDSM的精选乳腺成像子集(CBIS-DDSM)。在大量数据上进行训练可获得较高的准确率。然而,由于患者数量有限,生物医学数据集包含的样本数量相对较少。因此,数据增强是一种通过从原始输入数据生成新数据来增加输入数据大小的方法。数据增强有多种形式;这里使用的是旋转。当从乳房X光照片中手动裁剪ROI时,新训练的DCNN架构的准确率为71.01%。从两种分割技术获得的样本的曲线下面积(AUC)最高达到0.88(88%)。此外,当使用从CBIS-DDSM获得的样本时,DCNN的准确率提高到73.6%。因此,SVM的准确率变为87.2%,AUC等于0.94(94%)。与使用相同条件的先前工作相比,这是最高的AUC值。