Institute of Computer Science, Tartu University, Tartu, Estonia.
Sci Rep. 2022 Oct 6;12(1):16734. doi: 10.1038/s41598-022-20268-7.
Developing effective invasive Ductal Carcinoma (IDC) detection methods remains a challenging problem for breast cancer diagnosis. Recently, there has been notable success in utilizing deep neural networks in various application domains; however, it is well-known that deep neural networks require a large amount of labelled training data to achieve high accuracy. Such amounts of manually labelled data are time-consuming and expensive, especially when domain expertise is required. To this end, we present a novel semi-supervised learning framework for IDC detection using small amounts of labelled training examples to take advantage of cheap available unlabeled data. To gain trust in the prediction of the framework, we explain the prediction globally. Our proposed framework consists of five main stages: data augmentation, feature selection, dividing co-training data labelling, deep neural network modelling, and the interpretability of neural network prediction. The data cohort used in this study contains digitized BCa histopathology slides from 162 women with IDC at the Hospital of the University of Pennsylvania and the Cancer Institute of New Jersey. To evaluate the effectiveness of the deep neural network model used by the proposed approach, we compare it to different state-of-the-art network architectures; AlexNet and a shallow VGG network trained only on the labelled data. The results show that the deep neural network used in our proposed approach outperforms the state-of-the-art techniques achieving balanced accuracy of 0.73 and F-measure of 0.843. In addition, we compare the performance of the proposed semi-supervised approach to state-of-the-art semi-supervised DCGAN technique and self-learning technique. The experimental evaluation shows that our framework outperforms both semi-supervised techniques and detects IDC with an accuracy of 85.75%, a balanced accuracy of 0.865, and an F-measure of 0.773 using only 10% labelled instances from the training dataset while the rest of the training dataset is treated as unlabeled.
开发有效的浸润性导管癌 (IDC) 检测方法仍然是乳腺癌诊断中的一个具有挑战性的问题。最近,在各种应用领域中,深度神经网络取得了显著的成功;然而,众所周知,深度神经网络需要大量的标记训练数据才能达到高精度。这种数量的手动标记数据既耗时又昂贵,尤其是当需要领域专业知识时。为此,我们提出了一种新颖的半监督学习框架,用于使用少量标记训练示例来利用廉价可用的未标记数据进行 IDC 检测。为了获得对框架预测的信任,我们全局解释预测。我们提出的框架由五个主要阶段组成:数据增强、特征选择、协同训练数据标记划分、深度神经网络建模和神经网络预测的可解释性。本研究中使用的数据队列包含来自宾夕法尼亚大学医院和新泽西癌症研究所的 162 名 IDC 女性的数字化 BCa 组织病理学幻灯片。为了评估所提出方法中使用的深度神经网络模型的有效性,我们将其与不同的最先进的网络架构进行了比较;AlexNet 和仅在标记数据上训练的浅层 VGG 网络。结果表明,我们提出的方法中使用的深度神经网络优于最先进的技术,达到了平衡准确率为 0.73 和 F-measure 为 0.843。此外,我们还将所提出的半监督方法的性能与最先进的半监督 DCGAN 技术和自学习技术进行了比较。实验评估表明,我们的框架优于这两种半监督技术,仅使用训练数据集中 10%的标记实例即可检测 IDC,准确率为 85.75%,平衡准确率为 0.865,F-measure 为 0.773,而其余训练数据集被视为未标记。