Fondazione Bruno Kessler, Trento, Italy.
PLoS One. 2018 Dec 7;13(12):e0208924. doi: 10.1371/journal.pone.0208924. eCollection 2018.
We introduce the CDRP (Concatenated Diagnostic-Relapse Prognostic) architecture for multi-task deep learning that incorporates a clinical algorithm, e.g., a risk stratification schema to improve prognostic profiling. We present the first application to survival prediction in High-Risk (HR) Neuroblastoma from transcriptomics data, a task that studies from the MAQC consortium have shown to remain the hardest among multiple diagnostic and prognostic endpoints predictable from the same dataset. To obtain a more accurate risk stratification needed for appropriate treatment strategies, CDRP combines a first component (CDRP-A) synthesizing a diagnostic task and a second component (CDRP-N) dedicated to one or more prognostic tasks. The approach leverages the advent of semi-supervised deep learning structures that can flexibly integrate multimodal data or internally create multiple processing paths. CDRP-A is an autoencoder trained on gene expression on the HR/non-HR risk stratification by the Children's Oncology Group, obtaining a 64-node representation in the bottleneck layer. CDRP-N is a multi-task classifier for two prognostic endpoints, i.e., Event-Free Survival (EFS) and Overall Survival (OS). CDRP-A provides the HR embedding input to the CDRP-N shared layer, from which two branches depart to model EFS and OS, respectively. To control for selection bias, CDRP is trained and evaluated using a Data Analysis Protocol (DAP) developed within the MAQC initiative. CDRP was applied on Illumina RNA-Seq of 498 Neuroblastoma patients (HR: 176) from the SEQC study (12,464 Entrez genes) and on Affymetrix Human Exon Array expression profiles (17,450 genes) of 247 primary diagnostic Neuroblastoma of the TARGET NBL cohort. On the SEQC HR patients, CDRP achieves Matthews Correlation Coefficient (MCC) 0.38 for EFS and MCC = 0.19 for OS in external validation, improving over published SEQC models. We show that a CDRP-N embedding is indeed parametrically associated to increasing severity and the embedding can be used to better stratify patients' survival.
我们引入了 CDRP(串联诊断-复发预后)架构,用于多任务深度学习,该架构结合了临床算法,例如风险分层方案,以改善预后分析。我们首次将其应用于转录组数据的高危(HR)神经母细胞瘤的生存预测,这项任务在 MAQC 联盟的研究中表明,在从同一数据集可预测的多个诊断和预后终点中,仍然是最难的任务。为了获得更准确的风险分层,以制定适当的治疗策略,CDRP 结合了第一个组件(CDRP-A),该组件合成了一个诊断任务,以及第二个组件(CDRP-N),该组件专门用于一个或多个预后任务。该方法利用了半监督深度学习结构的出现,这些结构可以灵活地整合多模态数据或在内部创建多个处理路径。CDRP-A 是一个在儿童肿瘤学组的 HR/非 HR 风险分层上基于基因表达训练的自动编码器,在瓶颈层获得了 64 个节点的表示。CDRP-N 是一个用于两个预后终点的多任务分类器,即无事件生存(EFS)和总生存(OS)。CDRP-A 将 HR 嵌入输入提供给 CDRP-N 的共享层,从该层分出两个分支分别对 EFS 和 OS 进行建模。为了控制选择偏差,CDRP 使用在 MAQC 计划内开发的数据分析协议(DAP)进行训练和评估。CDRP 应用于 SEQC 研究中的 498 例神经母细胞瘤患者(HR:176 例)的 Illumina RNA-Seq(12464 个 Entrez 基因)和 TARGET NBL 队列中的 247 例原发性诊断神经母细胞瘤的 Affymetrix Human Exon Array 表达谱(17450 个基因)。在 SEQC HR 患者中,CDRP 在外部验证中获得 EFS 的马修斯相关系数(MCC)为 0.38,OS 的 MCC = 0.19,优于已发表的 SEQC 模型。我们表明,CDRP-N 嵌入确实与严重程度呈参数相关,并且该嵌入可用于更好地分层患者的生存。