School of Electronics, Electrical Engineering and Computer Science, Queen's University Belfast, 18 Malone Road, Belfast, BT9 5BN, Northern Ireland, UK.
Patrick G. Johnson Centre for Cancer Research, Queen's University Belfast, Belfast, Northern Ireland, UK.
BMC Bioinformatics. 2021 Nov 24;22(1):563. doi: 10.1186/s12859-021-04454-4.
Liver cancer (Hepatocellular carcinoma; HCC) prevalence is increasing and with poor clinical outcome expected it means greater understanding of HCC aetiology is urgently required. This study explored a deep learning solution to detect biologically important features that distinguish prognostic subgroups. A novel architecture of an Artificial Neural Network (ANN) trained with a customised objective function (L) was developed. The ANN should discover new data representations, to detect patient subgroups that are biologically homogenous (clustering loss) and similar in survival (survival loss) while removing noise from the data (reconstruction loss). The model was applied to TCGA-HCC multi-omics data and benchmarked against baseline models that only use a reconstruction objective function (BCE, MSE) for learning. With the baseline models, the new features are then filtered based on survival information and used for clustering patients. Different variants of the customised objective function, incorporating only reconstruction and clustering losses (L); and reconstruction and survival losses (L) were also evaluated. Robust features consistently detected were compared between models and validated in TCGA and LIRI-JP HCC cohorts.
The combined loss (L) discovered highly significant prognostic subgroups (P-value = 1.55E-77) with more accurate sample assignment (Silhouette scores: 0.59-0.7) compared to baseline models (0.18-0.3). All L bottleneck features (N = 100) were significant for survival, compared to only 11-21 for baseline models. Prognostic subgroups were not explained by disease grade or risk factors. Instead L identified robust features including 377 mRNAs, many of which were novel (61.27%) compared to those identified by the other losses. Some 75 mRNAs were prognostic in TCGA, while 29 were prognostic in LIRI-JP also. L also identified 15 robust miRNAs including two novel (hsa-let-7g; hsa-mir-550a-1) and 328 methylation features with 71% being prognostic. Gene-enrichment and Functional Annotation Analysis identified seven pathways differentiating prognostic clusters.
Combining cluster and survival metrics with the reconstruction objective function facilitated superior prognostic subgroup identification. The hybrid model identified more homogeneous clusters that consequently were more biologically meaningful. The novel and prognostic robust features extracted provide additional information to improve our understanding of a complex disease to help reveal its aetiology. Moreover, the gene features identified may have clinical applications as therapeutic targets.
肝癌(肝细胞癌;HCC)的发病率正在上升,由于临床预后较差,因此迫切需要更深入地了解 HCC 的病因。本研究探讨了一种深度学习解决方案,以检测区分预后亚组的生物学重要特征。开发了一种具有自定义目标函数(L)的人工神经网络(ANN)的新架构。该 ANN 应该发现新的数据表示形式,以检测生物学同质(聚类损失)且生存相似(生存损失)的患者亚组,同时从数据中去除噪声(重建损失)。该模型应用于 TCGA-HCC 多组学数据,并与仅使用重建目标函数(BCE、MSE)进行学习的基线模型进行了基准测试。使用基线模型,然后根据生存信息过滤新特征,并用于对患者进行聚类。还评估了仅包含重建和聚类损失(L)的自定义目标函数的不同变体;以及重建和生存损失(L)。在不同模型之间比较了稳健的特征,并在 TCGA 和 LIRI-JP HCC 队列中进行了验证。
联合损失(L)发现了具有高度显著预后的亚组(P 值=1.55E-77),与基线模型相比,样本分配更准确(轮廓得分:0.59-0.7)。与基线模型相比,所有 L 瓶颈特征(N=100)都与生存相关,而基线模型仅与 11-21 个特征相关。预后亚组不能用疾病分级或危险因素来解释。相反,L 确定了包括 377 个 mRNA 的稳健特征,其中许多是新颖的(61.27%),而其他损失则确定了 11-21 个 mRNA。在 TCGA 中,有 75 个 mRNA 具有预后意义,在 LIRI-JP 中,有 29 个 mRNA 也具有预后意义。L 还确定了 15 个稳健的 miRNA,其中包括两个新的(hsa-let-7g;hsa-mir-550a-1)和 328 个甲基化特征,其中 71%具有预后意义。基因富集和功能注释分析确定了区分预后簇的七个途径。
将聚类和生存指标与重建目标函数相结合,有利于更好地识别预后亚组。混合模型确定了更同质的聚类,因此更具有生物学意义。提取的新颖且稳健的特征提供了更多信息,有助于提高我们对复杂疾病的理解,以帮助揭示其病因。此外,鉴定的基因特征可能具有临床应用价值,可作为治疗靶点。