一种可解释的深度学习模型，用于从苏木精和伊红染色的病理图像中检测乳腺癌的致病性变体。

BACKGROUND: Determining the status of breast cancer susceptibility genes () is crucial for guiding breast cancer treatment. Nevertheless, the need for genetic testing among breast cancer patients remains unmet due to high costs and limited resources. This study aimed to develop a Bi-directional Self-Attention Multiple Instance Learning (BiAMIL) algorithm to detect status from hematoxylin and eosin (H&E) pathological images. METHODS: A total of 319 histopathological slides from 254 breast cancer patients were included, comprising two dependent cohorts. Following image pre-processing, 633,484 tumor tiles from the training dataset were employed to train the self-developed deep-learning model. The performance of the network was evaluated in the internal and external test sets. RESULTS: BiAMIL achieved AUC values of 0.819 (95% CI [0.673-0.965]) in the internal test set, and 0.817 (95% CI [0.712-0.923]) in the external test set. To explore the relationship between status and interpretable morphological features in pathological images, we utilized Class Activation Mapping (CAM) technique and cluster analysis to investigate the connections between gene mutation status and tissue and cell features. Significantly, we observed that tumor-infiltrating lymphocytes and the morphological characteristics of tumor cells appeared to be potential features associated with status. CONCLUSIONS: An interpretable deep neural network model based on the attention mechanism was developed to predict the status in breast cancer. Keywords: Breast cancer, , deep learning, self-attention, interpretability.

背景：确定乳腺癌易感性基因（）的状态对于指导乳腺癌治疗至关重要。然而，由于成本高和资源有限，乳腺癌患者的基因检测需求仍未得到满足。本研究旨在开发一种双向自注意力多实例学习（BiAMIL）算法，以从苏木精和伊红（H&E）病理图像中检测基因状态。

方法：共纳入 254 例乳腺癌患者的 319 张组织病理学切片，包括两个独立队列。在图像预处理后，从训练数据集的 633484 个肿瘤块中训练了自行开发的深度学习模型。在内部和外部测试集中评估了网络的性能。

结果：BiAMIL 在内部测试集中的 AUC 值为 0.819（95%CI [0.673-0.965]），在外部测试集中的 AUC 值为 0.817（95%CI [0.712-0.923]）。为了探索基因状态与病理图像中可解释形态特征之间的关系，我们利用类激活映射（CAM）技术和聚类分析来研究基因突变状态与组织和细胞特征之间的关系。值得注意的是，我们观察到肿瘤浸润淋巴细胞和肿瘤细胞的形态特征似乎是与基因状态相关的潜在特征。

结论：开发了一种基于注意力机制的可解释深度神经网络模型，以预测乳腺癌中的基因状态。

关键词：乳腺癌；；深度学习；自注意力；可解释性。