Department of Biostatistics, Epidemiology and Informatics, University of Pennsylvania, Philadelphia, PA, USA.
Department of Computer Science, Kennesaw State University, Marietta, GA, USA.
BMC Med Genomics. 2019 Dec 23;12(Suppl 10):189. doi: 10.1186/s12920-019-0624-2.
Understanding the complex biological mechanisms of cancer patient survival using genomic and clinical data is vital, not only to develop new treatments for patients, but also to improve survival prediction. However, highly nonlinear and high-dimension, low-sample size (HDLSS) data cause computational challenges to applying conventional survival analysis.
We propose a novel biologically interpretable pathway-based sparse deep neural network, named Cox-PASNet, which integrates high-dimensional gene expression data and clinical data on a simple neural network architecture for survival analysis. Cox-PASNet is biologically interpretable where nodes in the neural network correspond to biological genes and pathways, while capturing the nonlinear and hierarchical effects of biological pathways associated with cancer patient survival. We also propose a heuristic optimization solution to train Cox-PASNet with HDLSS data. Cox-PASNet was intensively evaluated by comparing the predictive performance of current state-of-the-art methods on glioblastoma multiforme (GBM) and ovarian serous cystadenocarcinoma (OV) cancer. In the experiments, Cox-PASNet showed out-performance, compared to the benchmarking methods. Moreover, the neural network architecture of Cox-PASNet was biologically interpreted, and several significant prognostic factors of genes and biological pathways were identified.
Cox-PASNet models biological mechanisms in the neural network by incorporating biological pathway databases and sparse coding. The neural network of Cox-PASNet can identify nonlinear and hierarchical associations of genomic and clinical data to cancer patient survival. The open-source code of Cox-PASNet in PyTorch implemented for training, evaluation, and model interpretation is available at: https://github.com/DataX-JieHao/Cox-PASNet.
利用基因组和临床数据理解癌症患者生存的复杂生物学机制至关重要,这不仅有助于为患者开发新的治疗方法,还有助于提高生存预测。然而,高度非线性和高维、低样本量(HDLSS)数据给应用传统生存分析带来了计算挑战。
我们提出了一种新的基于生物学可解释途径的稀疏深度神经网络,称为 Cox-PASNet,它在简单的神经网络架构中集成了高维基因表达数据和临床数据,用于生存分析。Cox-PASNet 具有生物学可解释性,其中神经网络中的节点对应于生物学基因和途径,同时捕获与癌症患者生存相关的生物学途径的非线性和层次效应。我们还提出了一种启发式优化解决方案,用于使用 HDLSS 数据训练 Cox-PASNet。Cox-PASNet 是通过比较当前最先进的方法在胶质母细胞瘤(GBM)和卵巢浆液性囊腺癌(OV)癌症上的预测性能来进行深入评估的。在实验中,Cox-PASNet 与基准方法相比表现出了卓越的性能。此外,Cox-PASNet 的神经网络结构具有生物学解释性,并确定了一些基因和生物学途径的显著预后因素。
Cox-PASNet 通过整合生物途径数据库和稀疏编码在神经网络中模拟生物学机制。Cox-PASNet 的神经网络可以识别基因组和临床数据与癌症患者生存之间的非线性和层次关联。Cox-PASNet 的 PyTorch 开源代码可用于训练、评估和模型解释,网址为:https://github.com/DataX-JieHao/Cox-PASNet。