Cui Can, Asad Zuhayr, Dean William F, Smith Isabelle T, Madden Christopher, Bao Shunxing, Landman Bennett A, Roland Joseph T, Coburn Lori A, Wilson Keith T, Zwerner Jeffrey P, Zhao Shilin, Wheless Lee E, Huo Yuankai
Department of Computer Science, Vanderbilt University, Nashville, TN 37235, USA.
College of Arts and Science, Vanderbilt University, Nashville, TN 37235, USA.
Proc SPIE Int Soc Opt Eng. 2022 Feb-Mar;12033. doi: 10.1117/12.2612318. Epub 2022 Apr 4.
Multi-modal learning (e.g., integrating pathological images with genomic features) tends to improve the accuracy of cancer diagnosis and prognosis as compared to learning with a single modality. However, missing data is a common problem in clinical practice, i.e., not every patient has all modalities available. Most of the previous works directly discarded samples with missing modalities, which might lose information in these data and increase the likelihood of overfitting. In this work, we generalize the multi-modal learning in cancer diagnosis with the capacity of dealing with missing data using histological images and genomic data. Our integrated model can utilize all available data from patients with both complete and partial modalities. The experiments on the public TCGA-GBM and TCGA-LGG datasets show that the data with missing modalities can contribute to multi-modal learning, which improves the model performance in grade classification of glioma cancer.
与单模态学习相比,多模态学习(例如将病理图像与基因组特征相结合)往往能提高癌症诊断和预后的准确性。然而,数据缺失是临床实践中的常见问题,即并非每个患者都具备所有可用模态的数据。以前的大多数工作直接丢弃具有缺失模态的样本,这可能会丢失这些数据中的信息,并增加过拟合的可能性。在这项工作中,我们通过使用组织学图像和基因组数据来处理数据缺失的能力,推广了癌症诊断中的多模态学习。我们的集成模型可以利用来自具有完整和部分模态数据患者的所有可用数据。在公共TCGA-GBM和TCGA-LGG数据集上的实验表明,具有缺失模态的数据有助于多模态学习,从而提高了神经胶质瘤癌症分级分类中的模型性能。