Phan Nam Nhut, Hsu Chih-Yi, Huang Chi-Cheng, Tseng Ling-Ming, Chuang Eric Y
Bioinformatics Program, Taiwan International Graduate Program, Institute of Information Science, Academia Sinica, Taipei, Taiwan.
Graduate Institute of Biomedical Electronics and Bioinformatics, National Taiwan University, Taipei, Taiwan.
Front Oncol. 2021 Oct 21;11:734015. doi: 10.3389/fonc.2021.734015. eCollection 2021.
The present study aimed to assign a risk score for breast cancer recurrence based on pathological whole slide images (WSIs) using a deep learning model.
A total of 233 WSIs from 138 breast cancer patients were assigned either a low-risk or a high-risk score based on a 70-gene signature. These images were processed into patches of 512x512 pixels by the PyHIST tool and underwent color normalization using the Macenko method. Afterward, out of focus and pixelated patches were removed using the Laplacian algorithm. Finally, the remaining patches (n=294,562) were split into 3 parts for model training (50%), validation (7%) and testing (43%). We used 6 pretrained models for transfer learning and evaluated their performance using accuracy, precision, recall, F1 score, confusion matrix, and AUC. Additionally, to demonstrate the robustness of the final model and its generalization capacity, the testing set was used for model evaluation. Finally, the GRAD-CAM algorithm was used for model visualization.
Six models, namely VGG16, ResNet50, ResNet101, Inception_ResNet, EfficientB5, and Xception, achieved high performance in the validation set with an overall accuracy of 0.84, 0.85, 0.83, 0.84, 0.87, and 0.91, respectively. We selected Xception for assessment of the testing set, and this model achieved an overall accuracy of 0.87 with a patch-wise approach and 0.90 and 1.00 with a patient-wise approach for high-risk and low-risk groups, respectively.
Our study demonstrated the feasibility and high performance of artificial intelligence models trained without region-of-interest labeling for predicting cancer recurrence based on a 70-gene signature risk score.
本研究旨在使用深度学习模型基于病理全切片图像(WSIs)为乳腺癌复发分配风险评分。
根据70基因特征,为138例乳腺癌患者的总共233张WSIs分配低风险或高风险评分。这些图像通过PyHIST工具处理成512x512像素的图像块,并使用Macenko方法进行颜色归一化。之后,使用拉普拉斯算法去除失焦和像素化的图像块。最后,将剩余的图像块(n = 294,562)分成3部分用于模型训练(50%)、验证(7%)和测试(43%)。我们使用6个预训练模型进行迁移学习,并使用准确率、精确率、召回率、F1分数、混淆矩阵和AUC评估它们的性能。此外,为了证明最终模型的稳健性及其泛化能力,测试集用于模型评估。最后,使用GRAD-CAM算法进行模型可视化。
六个模型,即VGG16、ResNet50、ResNet101、Inception_ResNet、EfficientB5和Xception,在验证集中表现出高性能,总体准确率分别为0.84、0.85、0.83、0.84、0.87和0.91。我们选择Xception对测试集进行评估,该模型在高风险和低风险组中,采用逐图像块方法时总体准确率为0.87,采用逐患者方法时分别为0.90和1.00。
我们的研究证明了基于70基因特征风险评分训练的人工智能模型在无感兴趣区域标记的情况下预测癌症复发的可行性和高性能。