Wahab Noorul, Khan Asifullah, Lee Yeon Soo
Department of Computer and Information Sciences, Pakistan Institute of Engineering and Applied Sciences Islamabad, Pakistan.
Department of Computer and Information Sciences, Pakistan Institute of Engineering and Applied Sciences Islamabad, Pakistan.
Comput Biol Med. 2017 Jun 1;85:86-97. doi: 10.1016/j.compbiomed.2017.04.012. Epub 2017 Apr 18.
Different types of breast cancer are affecting lives of women across the world. Common types include Ductal carcinoma in situ (DCIS), Invasive ductal carcinoma (IDC), Tubular carcinoma, Medullary carcinoma, and Invasive lobular carcinoma (ILC). While detecting cancer, one important factor is mitotic count - showing how rapidly the cells are dividing. But the class imbalance problem, due to the small number of mitotic nuclei in comparison to the overwhelming number of non-mitotic nuclei, affects the performance of classification models. This work presents a two-phase model to mitigate the class biasness issue while classifying mitotic and non-mitotic nuclei in breast cancer histopathology images through a deep convolutional neural network (CNN). First, nuclei are segmented out using blue ratio and global binary thresholding. In Phase-1 a CNN is then trained on the segmented out 80×80 pixel patches based on a standard dataset. Hard non-mitotic examples are identified and augmented; mitotic examples are oversampled by rotation and flipping; whereas non-mitotic examples are undersampled by blue ratio histogram based k-means clustering. Based on this information from Phase-1, the dataset is modified for Phase-2 in order to reduce the effects of class imbalance. The proposed CNN architecture and data balancing technique yielded an F-measure of 0.79, and outperformed all the methods relying on specific handcrafted features, as well as those using a combination of handcrafted and CNN-generated features.
不同类型的乳腺癌正影响着世界各地女性的生活。常见类型包括导管原位癌(DCIS)、浸润性导管癌(IDC)、管状癌、髓样癌和浸润性小叶癌(ILC)。在检测癌症时,一个重要因素是有丝分裂计数——显示细胞分裂的速度。但是,由于与大量非有丝分裂细胞核相比,有丝分裂细胞核数量较少,类不平衡问题影响了分类模型的性能。这项工作提出了一个两阶段模型,通过深度卷积神经网络(CNN)对乳腺癌组织病理学图像中的有丝分裂和非有丝分裂细胞核进行分类时,减轻类偏差问题。首先,使用蓝色比例和全局二值化阈值分割出细胞核。在第一阶段,基于一个标准数据集,在分割出的80×80像素块上训练一个CNN。识别并增强硬非有丝分裂示例;通过旋转和翻转对有丝分裂示例进行过采样;而通过基于蓝色比例直方图的k均值聚类对非有丝分裂示例进行欠采样。基于第一阶段的这些信息,为第二阶段修改数据集,以减少类不平衡的影响。所提出的CNN架构和数据平衡技术产生了0.79的F值,并且优于所有依赖特定手工特征的方法,以及那些使用手工特征和CNN生成特征组合的方法。