ICSDA:一种多模态深度学习模型,通过整合病理、临床和基因表达数据来预测乳腺癌复发和转移风险。
ICSDA: a multi-modal deep learning model to predict breast cancer recurrence and metastasis risk by integrating pathological, clinical and gene expression data.
机构信息
School of Mathematics and Statistics, Hainan Normal University, Haikou 570100, China.
Key Laboratory of Data Science and Intelligence Education, Ministry of Education, Hainan Normal University, Haikou, China.
出版信息
Brief Bioinform. 2022 Nov 19;23(6). doi: 10.1093/bib/bbac448.
Breast cancer patients often have recurrence and metastasis after surgery. Predicting the risk of recurrence and metastasis for a breast cancer patient is essential for the development of precision treatment. In this study, we proposed a novel multi-modal deep learning prediction model by integrating hematoxylin & eosin (H&E)-stained histopathological images, clinical information and gene expression data. Specifically, we segmented tumor regions in H&E into image blocks (256 × 256 pixels) and encoded each image block into a 1D feature vector using a deep neural network. Then, the attention module scored each area of the H&E-stained images and combined image features with clinical and gene expression data to predict the risk of recurrence and metastasis for each patient. To test the model, we downloaded all 196 breast cancer samples from the Cancer Genome Atlas with clinical, gene expression and H&E information simultaneously available. The samples were then divided into the training and testing sets with a ratio of 7: 3, in which the distributions of the samples were kept between the two datasets by hierarchical sampling. The multi-modal model achieved an area-under-the-curve value of 0.75 on the testing set better than those based solely on H&E image, sequencing data and clinical data, respectively. This study might have clinical significance in identifying high-risk breast cancer patients, who may benefit from postoperative adjuvant treatment.
乳腺癌患者手术后常常会复发和转移。预测乳腺癌患者的复发和转移风险对于精准治疗的发展至关重要。在这项研究中,我们提出了一种新的多模态深度学习预测模型,该模型整合了苏木精和伊红(H&E)染色的组织病理学图像、临床信息和基因表达数据。具体来说,我们将 H&E 染色的肿瘤区域分割成图像块(256×256 像素),并使用深度神经网络将每个图像块编码为一维特征向量。然后,注意力模块对 H&E 染色图像的每个区域进行评分,并将图像特征与临床和基因表达数据相结合,以预测每位患者的复发和转移风险。为了测试模型,我们从癌症基因组图谱下载了所有 196 个具有临床、基因表达和 H&E 信息的乳腺癌样本,并将其分为训练集和测试集,比例为 7:3,其中通过分层抽样在两个数据集之间保持样本的分布一致。该多模态模型在测试集上的曲线下面积值为 0.75,优于仅基于 H&E 图像、测序数据和临床数据的模型。这项研究可能具有临床意义,可以识别高风险的乳腺癌患者,这些患者可能从术后辅助治疗中获益。