Azher Zarif L, Suvarna Anish, Chen Ji-Qing, Zhang Ze, Christensen Brock C, Salas Lucas A, Vaickus Louis J, Levy Joshua J
Thomas Jefferson High School for Science and Technology, Alexandria, VA, USA.
Cancer Biology Graduate Program, Dartmouth College Geisel School of Medicine, Hanover, NH, USA.
BioData Min. 2023 Jul 22;16(1):23. doi: 10.1186/s13040-023-00338-w.
Deep learning models can infer cancer patient prognosis from molecular and anatomic pathology information. Recent studies that leveraged information from complementary multimodal data improved prognostication, further illustrating the potential utility of such methods. However, current approaches: 1) do not comprehensively leverage biological and histomorphological relationships and 2) make use of emerging strategies to "pretrain" models (i.e., train models on a slightly orthogonal dataset/modeling objective) which may aid prognostication by reducing the amount of information required for achieving optimal performance. In addition, model interpretation is crucial for facilitating the clinical adoption of deep learning methods by fostering practitioner understanding and trust in the technology.
Here, we develop an interpretable multimodal modeling framework that combines DNA methylation, gene expression, and histopathology (i.e., tissue slides) data, and we compare performance of crossmodal pretraining, contrastive learning, and transfer learning versus the standard procedure.
Our models outperform the existing state-of-the-art method (average 11.54% C-index increase), and baseline clinically driven models (average 11.7% C-index increase). Model interpretations elucidate consideration of biologically meaningful factors in making prognosis predictions.
Our results demonstrate that the selection of pretraining strategies is crucial for obtaining highly accurate prognostication models, even more so than devising an innovative model architecture, and further emphasize the all-important role of the tumor microenvironment on disease progression.
深度学习模型可根据分子和解剖病理学信息推断癌症患者的预后。最近利用互补多模态数据信息的研究改善了预后预测,进一步说明了此类方法的潜在效用。然而,当前方法:1)未全面利用生物学和组织形态学关系;2)采用新兴策略对模型进行“预训练”(即在稍有正交性的数据集/建模目标上训练模型),这可能通过减少实现最佳性能所需的信息量来辅助预后预测。此外,模型解释对于促进深度学习方法在临床中的应用至关重要,因为它能增进从业者对该技术的理解和信任。
在此,我们开发了一个可解释的多模态建模框架,该框架结合了DNA甲基化、基因表达和组织病理学(即组织切片)数据,并将跨模态预训练、对比学习和迁移学习的性能与标准程序进行了比较。
我们的模型优于现有的最先进方法(C指数平均提高11.54%)和基线临床驱动模型(C指数平均提高11.7%)。模型解释阐明了在进行预后预测时对生物学上有意义的因素的考量。
我们的结果表明,预训练策略的选择对于获得高度准确的预后预测模型至关重要,甚至比设计创新的模型架构更为关键,并且进一步强调了肿瘤微环境在疾病进展中的至关重要的作用。