Department of Biochemistry, Schulich School of Medicine & Dentistry, Western University, London, ON, Canada.
Department of Computer Science, Western University, London, ON, Canada.
J Transl Med. 2024 Mar 2;22(1):226. doi: 10.1186/s12967-024-05018-9.
Breast Cancer (BC) is a highly heterogeneous and complex disease. Personalized treatment options require the integration of multi-omic data and consideration of phenotypic variability. Radiogenomics aims to merge medical images with genomic measurements but encounter challenges due to unpaired data consisting of imaging, genomic, or clinical outcome data. In this study, we propose the utilization of a well-trained conditional generative adversarial network (cGAN) to address the unpaired data issue in radiogenomic analysis of BC. The generated images will then be used to predict the mutations status of key driver genes and BC subtypes.
We integrated the paired MRI and multi-omic (mRNA gene expression, DNA methylation, and copy number variation) profiles of 61 BC patients from The Cancer Imaging Archive (TCIA) and The Cancer Genome Atlas (TCGA). To facilitate this integration, we employed a Bayesian Tensor Factorization approach to factorize the multi-omic data into 17 latent features. Subsequently, a cGAN model was trained based on the matched side-view patient MRIs and their corresponding latent features to predict MRIs for BC patients who lack MRIs. Model performance was evaluated by calculating the distance between real and generated images using the Fréchet Inception Distance (FID) metric. BC subtype and mutation status of driver genes were obtained from the cBioPortal platform, where 3 genes were selected based on the number of mutated patients. A convolutional neural network (CNN) was constructed and trained using the generated MRIs for mutation status prediction. Receiver operating characteristic area under curve (ROC-AUC) and precision-recall area under curve (PR-AUC) were used to evaluate the performance of the CNN models for mutation status prediction. Precision, recall and F1 score were used to evaluate the performance of the CNN model in subtype classification.
The FID of the images from the well-trained cGAN model based on the test set is 1.31. The CNN for TP53, PIK3CA, and CDH1 mutation prediction yielded ROC-AUC values 0.9508, 0.7515, and 0.8136 and PR-AUC are 0.9009, 0.7184, and 0.5007, respectively for the three genes. Multi-class subtype prediction achieved precision, recall and F1 scores of 0.8444, 0.8435 and 0.8336 respectively. The source code and related data implemented the algorithms can be found in the project GitHub at https://github.com/mattthuang/BC_RadiogenomicGAN .
Our study establishes cGAN as a viable tool for generating synthetic BC MRIs for mutation status prediction and subtype classification to better characterize the heterogeneity of BC in patients. The synthetic images also have the potential to significantly augment existing MRI data and circumvent issues surrounding data sharing and patient privacy for future BC machine learning studies.
乳腺癌(BC)是一种高度异质且复杂的疾病。个性化治疗方案需要整合多组学数据,并考虑表型变异性。放射组学旨在将医学图像与基因组测量相结合,但由于包含成像、基因组或临床结果数据的非配对数据而面临挑战。在这项研究中,我们提出利用经过良好训练的条件生成对抗网络(cGAN)来解决 BC 放射组学分析中的非配对数据问题。然后,生成的图像将用于预测关键驱动基因和 BC 亚型的突变状态。
我们整合了来自癌症成像档案(TCIA)和癌症基因组图谱(TCGA)的 61 名 BC 患者的配对 MRI 和多组学(mRNA 基因表达、DNA 甲基化和拷贝数变异)谱。为了便于整合,我们采用贝叶斯张量分解方法将多组学数据分解为 17 个潜在特征。随后,基于匹配的侧视图患者 MRI 及其对应的潜在特征训练 cGAN 模型,以预测缺乏 MRI 的 BC 患者的 MRI。使用 Fréchet Inception Distance(FID)度量来计算真实图像和生成图像之间的距离,从而评估模型性能。BC 亚型和驱动基因的突变状态从 cBioPortal 平台获得,其中根据突变患者数量选择了 3 个基因。使用生成的 MRI 构建和训练卷积神经网络(CNN),以进行突变状态预测。使用接收器操作特征曲线下面积(ROC-AUC)和精度-召回曲线下面积(PR-AUC)来评估 CNN 模型对突变状态预测的性能。使用精度、召回率和 F1 分数来评估 CNN 模型在亚型分类中的性能。
基于测试集的经过良好训练的 cGAN 模型的图像的 FID 为 1.31。用于 TP53、PIK3CA 和 CDH1 突变预测的 CNN 产生了 0.9508、0.7515 和 0.8136 的 ROC-AUC 值,以及 0.9009、0.7184 和 0.5007 的 PR-AUC 值,分别用于这三个基因。多类亚型预测的精度、召回率和 F1 分数分别为 0.8444、0.8435 和 0.8336。算法的源代码和相关数据可在项目 GitHub 上找到,网址为 https://github.com/mattthuang/BC_RadiogenomicGAN。
我们的研究确立了 cGAN 作为一种可行的工具,可用于生成用于突变状态预测和亚型分类的合成 BC MRI,以更好地描述患者中 BC 的异质性。合成图像还有可能显著增加现有的 MRI 数据,并解决数据共享和患者隐私问题,为未来的 BC 机器学习研究提供便利。