Monta Vista High School, Cupertino, CA, USA.
Department of Medicine and Biomedical Data Science, Stanford Center for Biomedical Informatics Research, Stanford University, Stanford, CA, USA.
Bioinformatics. 2019 Jul 15;35(14):i446-i454. doi: 10.1093/bioinformatics/btz342.
Estimating the future course of patients with cancer lesions is invaluable to physicians; however, current clinical methods fail to effectively use the vast amount of multimodal data that is available for cancer patients. To tackle this problem, we constructed a multimodal neural network-based model to predict the survival of patients for 20 different cancer types using clinical data, mRNA expression data, microRNA expression data and histopathology whole slide images (WSIs). We developed an unsupervised encoder to compress these four data modalities into a single feature vector for each patient, handling missing data through a resilient, multimodal dropout method. Encoding methods were tailored to each data type-using deep highway networks to extract features from clinical and genomic data, and convolutional neural networks to extract features from WSIs.
We used pancancer data to train these feature encodings and predict single cancer and pancancer overall survival, achieving a C-index of 0.78 overall. This work shows that it is possible to build a pancancer model for prognosis that also predicts prognosis in single cancer sites. Furthermore, our model handles multiple data modalities, efficiently analyzes WSIs and represents patient multimodal data flexibly into an unsupervised, informative representation. We thus present a powerful automated tool to accurately determine prognosis, a key step towards personalized treatment for cancer patients.
预测癌症病变患者的未来病程对于医生来说是非常宝贵的;然而,目前的临床方法未能有效地利用癌症患者可获得的大量多模态数据。为了解决这个问题,我们构建了一个基于多模态神经网络的模型,使用临床数据、mRNA 表达数据、microRNA 表达数据和组织病理学全切片图像(WSI)来预测 20 种不同癌症类型的患者的生存率。我们开发了一种无监督编码器,用于将这四种数据模态压缩为每个患者的单个特征向量,通过弹性多模态丢弃方法处理缺失数据。编码方法针对每种数据类型进行了定制-使用深度高速公路网络从临床和基因组数据中提取特征,以及使用卷积神经网络从 WSI 中提取特征。
我们使用泛癌症数据来训练这些特征编码,并预测单癌症和泛癌症总生存率,总体 C 指数达到 0.78。这项工作表明,构建一个用于预后的泛癌症模型,同时也可以预测单癌症部位的预后是可能的。此外,我们的模型处理多种数据模态,高效地分析 WSI,并将患者的多模态数据灵活地表示为无监督的、信息丰富的表示。因此,我们提供了一种强大的自动化工具,可以准确地确定预后,这是为癌症患者提供个性化治疗的关键步骤。