Institute of Health Informatics, University College London, London, UK.
AstraZeneca, Oncology Data Science, Waltham, MA, USA.
Sci Rep. 2023 Sep 22;13(1):15761. doi: 10.1038/s41598-023-42365-x.
The ability to accurately predict non-small cell lung cancer (NSCLC) patient survival is crucial for informing physician decision-making, and the increasing availability of multi-omics data offers the promise of enhancing prognosis predictions. We present a multimodal integration approach that leverages microRNA, mRNA, DNA methylation, long non-coding RNA (lncRNA) and clinical data to predict NSCLC survival and identify patient subtypes, utilizing denoising autoencoders for data compression and integration. Survival performance for patients with lung adenocarcinoma (LUAD) and squamous cell carcinoma (LUSC) was compared across modality combinations and data integration methods. Using The Cancer Genome Atlas data, our results demonstrate that survival prediction models combining multiple modalities outperform single modality models. The highest performance was achieved with a combination of only two modalities, lncRNA and clinical, at concordance indices (C-indices) of 0.69 ± 0.03 for LUAD and 0.62 ± 0.03 for LUSC. Models utilizing all five modalities achieved mean C-indices of 0.67 ± 0.04 and 0.63 ± 0.02 for LUAD and LUSC, respectively, while the best individual modality performance reached C-indices of 0.64 ± 0.03 for LUAD and 0.59 ± 0.03 for LUSC. Analysis of biological differences revealed two distinct survival subtypes with over 900 differentially expressed transcripts.
准确预测非小细胞肺癌(NSCLC)患者的生存情况对于指导医生决策至关重要,而越来越多的多组学数据为提高预后预测提供了可能。我们提出了一种多模态整合方法,利用 microRNA、mRNA、DNA 甲基化、长链非编码 RNA(lncRNA)和临床数据来预测 NSCLC 患者的生存情况,并识别患者亚型,利用去噪自编码器进行数据压缩和整合。我们比较了不同模态组合和数据整合方法在肺腺癌(LUAD)和鳞状细胞癌(LUSC)患者中的生存表现。使用癌症基因组图谱(The Cancer Genome Atlas,TCGA)数据,我们的结果表明,结合多种模态的生存预测模型优于单一模态模型。在 LUAD 和 LUSC 的一致性指数(C-index)分别为 0.69±0.03 和 0.62±0.03 时,仅使用 lncRNA 和临床两种模态的组合获得了最高的性能。利用所有五种模态的模型分别实现了 LUAD 和 LUSC 的平均 C-index 为 0.67±0.04 和 0.63±0.02,而最佳的单个模态性能达到了 LUAD 的 C-index 为 0.64±0.03 和 LUSC 的 C-index 为 0.59±0.03。对生物学差异的分析揭示了两个具有超过 900 个差异表达转录本的不同生存亚型。