Wang Xiaoxiao, Zou Chong, Zhang Yi, Li Xiuqing, Wang Chenxi, Ke Fei, Chen Jie, Wang Wei, Wang Dian, Xu Xinyu, Xie Ling, Zhang Yifen
Department of GCP Research Center, Jiangsu Province Hospital of Chinese Medicine, The Affiliated Hospital of Nanjing University of Chinese Medicine, Nanjing, China.
Department of Pathology, Jiangsu Cancer Hospital, Nanjing, China.
Front Genet. 2021 Jul 20;12:661109. doi: 10.3389/fgene.2021.661109. eCollection 2021.
Breast cancer is one of the most common cancers and the leading cause of death from cancer among women worldwide. The genetic predisposition to breast cancer may be associated with a mutation in particular genes such as gene BRCA1/2. Patients who carry a germline pathogenic mutation in BRCA1/2 genes have a significantly increased risk of developing breast cancer and might benefit from targeted therapy. However, genetic testing is time consuming and costly. This study aims to predict the risk of gBRCA mutation by using the whole-slide pathology features of breast cancer H&E stains and the patients' gBRCA mutation status.
In this study, we trained a deep convolutional neural network (CNN) of ResNet on whole-slide images (WSIs) to predict the gBRCA mutation in breast cancer. Since the dimensions are too large for slide-based training, we divided WSI into smaller tiles with the original resolution. The tile-based classification was then combined by adding the positive classification result to generate the combined slide-based accuracy. Models were trained based on the annotated tumor location and gBRCA mutation status labeled by a designated breast cancer pathologist. Four models were trained on tiles cropped at 5×, 10×, 20×, and 40× magnification, assuming that low magnification and high magnification may provide different levels of information for classification.
A trained model was validated through an external dataset that contains 17 mutants and 47 wilds. In the external validation dataset, AUCs (95% CI) of DL models that used 40×, 20×, 10×, and 5× magnification tiles among all cases were 0.766 (0.763-0.769), 0.763 (0.758-0.769), 0.750 (0.738-0.761), and 0.551 (0.526-0.575), respectively, while the corresponding magnification slides among all cases were 0.774 (0.642-0.905), 0.804 (0.676-0.931), 0.828 (0.691-0.966), and 0.635 (0.471-0.798), respectively. The study also identified the influence of histological grade to the accuracy of the prediction.
In this paper, the combination of pathology and molecular omics was used to establish the gBRCA mutation risk prediction model, revealing the correlation between the whole-slide histopathological images and gRCA mutation risk. The results indicated that the prediction accuracy is likely to improve as the training data expand. The findings demonstrated that deep CNNs could be used to assist pathologists in the detection of gene mutation in breast cancer.
乳腺癌是最常见的癌症之一,也是全球女性癌症死亡的主要原因。乳腺癌的遗传易感性可能与特定基因(如BRCA1/2基因)的突变有关。携带BRCA1/2基因种系致病突变的患者患乳腺癌的风险显著增加,可能从靶向治疗中获益。然而,基因检测既耗时又昂贵。本研究旨在利用乳腺癌苏木精-伊红(H&E)染色的全切片病理特征和患者的gBRCA突变状态来预测gBRCA突变风险。
在本研究中,我们在全切片图像(WSIs)上训练了一个ResNet深度卷积神经网络(CNN),以预测乳腺癌中的gBRCA突变。由于基于玻片的训练尺寸太大,我们将WSI划分为具有原始分辨率的较小图像块。然后通过将阳性分类结果相加来组合基于图像块的分类,以生成基于玻片的组合准确率。基于指定乳腺癌病理学家标记的肿瘤位置和gBRCA突变状态对模型进行训练。在5倍、10倍、20倍和40倍放大倍数下裁剪的图像块上训练了四个模型,假设低倍和高倍放大倍数可能为分类提供不同水平的信息。
通过一个包含17个突变体和47个野生型的外部数据集对训练好的模型进行了验证。在外部验证数据集中,在所有病例中使用40倍、20倍、10倍和5倍放大倍数图像块的深度学习(DL)模型的曲线下面积(AUCs,95%可信区间)分别为0.766(0.763 - 0.769)、0.763(0.758 - 0.769)、0.750(0.738 - 0.761)和0.551(0.526 - 0.575),而在所有病例中相应放大倍数的玻片的AUCs分别为0.774(0.642 - 0.905)、0.804(0.676 - 0.931)、0.828(0.691 - 0.966)和0.635(0.471 - 0.798)。该研究还确定了组织学分级对预测准确性的影响。
本文利用病理学和分子组学相结合的方法建立了gBRCA突变风险预测模型,揭示了全切片组织病理学图像与gRCA突变风险之间的相关性。结果表明,随着训练数据的扩展,预测准确性可能会提高。研究结果表明,深度CNN可用于协助病理学家检测乳腺癌中的基因突变。