Olatunji Isaac, Cui Feng
Thomas H. Gosnell School of Life Science, Rochester Institute of Technology, Rochester, NY, United States.
Front Bioinform. 2023 May 9;3:1131021. doi: 10.3389/fbinf.2023.1131021. eCollection 2023.
Metastasis of cancer is directly related to death in almost all cases, however a lot is yet to be understood about this process. Despite advancements in the available radiological investigation techniques, not all cases of Distant Metastasis (DM) are diagnosed at initial clinical presentation. Also, there are currently no standard biomarkers of metastasis. Early, accurate diagnosis of DM is however crucial for clinical decision making, and planning of appropriate management strategies. Previous works have achieved little success in attempts to predict DM from either clinical, genomic, radiology, or histopathology data. In this work we attempt a multimodal approach to predict the presence of DM in cancer patients by combining gene expression data, clinical data and histopathology images. We tested a novel combination of Random Forest (RF) algorithm with an optimization technique for gene selection, and investigated if gene expression pattern in the primary tissues of three cancer types (Bladder Carcinoma, Pancreatic Adenocarcinoma, and Head and Neck Squamous Carcinoma) with DM are similar or different. Gene expression biomarkers of DM identified by our proposed method outperformed Differentially Expressed Genes (DEGs) identified by the DESeq2 software package in the task of predicting presence or absence of DM. Genes involved in DM tend to be more cancer type specific rather than general across all cancers. Our results also indicate that multimodal data is more predictive of metastasis than either of the three unimodal data tested, and genomic data provides the highest contribution by a wide margin. The results re-emphasize the importance for availability of sufficient image data when a weakly supervised training technique is used. Code is made available at: https://github.com/rit-cui-lab/Multimodal-AI-for-Prediction-of-Distant-Metastasis-in-Carcinoma-Patients.
在几乎所有病例中,癌症转移都与死亡直接相关,然而,关于这一过程仍有许多有待了解之处。尽管现有放射学检查技术有所进步,但并非所有远处转移(DM)病例在初次临床表现时都能被诊断出来。此外,目前还没有转移的标准生物标志物。然而,早期、准确地诊断DM对于临床决策和制定适当的管理策略至关重要。以往的研究在试图从临床、基因组、放射学或组织病理学数据预测DM方面取得的成功甚微。在这项工作中,我们尝试采用多模态方法,通过结合基因表达数据、临床数据和组织病理学图像来预测癌症患者中DM的存在。我们测试了随机森林(RF)算法与一种基因选择优化技术的新型组合,并研究了三种伴有DM的癌症类型(膀胱癌、胰腺腺癌和头颈部鳞状细胞癌)原发组织中的基因表达模式是相似还是不同。我们提出的方法所识别的DM基因表达生物标志物在预测DM是否存在的任务中优于DESeq2软件包所识别的差异表达基因(DEG)。参与DM的基因往往更具癌症类型特异性,而非在所有癌症中普遍存在。我们的结果还表明,多模态数据比所测试的三种单模态数据中的任何一种都更能预测转移,并且基因组数据的贡献远远最高。结果再次强调了在使用弱监督训练技术时提供足够图像数据的重要性。代码可在以下网址获取:https://github.com/rit-cui-lab/Multimodal-AI-for-Prediction-of-Distant-Metastasis-in-Carcinoma-Patients 。