Department of Computer Science and Engineering, Institute of Engineering Research.
Bioinformatics Institute.
Bioinformatics. 2020 Jun 1;36(12):3818-3824. doi: 10.1093/bioinformatics/btaa203.
Biological pathway is an important curated knowledge of biological processes. Thus, cancer subtype classification based on pathways will be very useful to understand differences in biological mechanisms among cancer subtypes. However, pathways include only a fraction of the entire gene set, only one-third of human genes in KEGG, and pathways are fragmented. For this reason, there are few computational methods to use pathways for cancer subtype classification.
We present an explainable deep-learning model with attention mechanism and network propagation for cancer subtype classification. Each pathway is modeled by a graph convolutional network. Then, a multi-attention-based ensemble model combines several hundreds of pathways in an explainable manner. Lastly, network propagation on pathway-gene network explains why gene expression profiles in subtypes are different. In experiments with five TCGA cancer datasets, our method achieved very good classification accuracies and, additionally, identified subtype-specific pathways and biological functions.
The source code is available at http://biohealth.snu.ac.kr/software/GCN_MAE.
Supplementary data are available at Bioinformatics online.
生物途径是生物过程的重要已编目知识。因此,基于途径的癌症亚型分类对于理解癌症亚型之间的生物学机制差异将非常有用。然而,途径仅包含整个基因集的一小部分,KEGG 中的人类基因仅占三分之一,并且途径是碎片化的。出于这个原因,很少有计算方法可以使用途径进行癌症亚型分类。
我们提出了一种具有注意力机制和网络传播的可解释深度学习模型,用于癌症亚型分类。每个途径都由图卷积网络建模。然后,基于多注意力的集成模型以可解释的方式组合数百种途径。最后,途径-基因网络上的网络传播解释了为什么亚型中的基因表达谱不同。在五个 TCGA 癌症数据集的实验中,我们的方法实现了非常好的分类准确率,并且还识别了亚型特异性途径和生物学功能。
源代码可在 http://biohealth.snu.ac.kr/software/GCN_MAE 获得。
补充数据可在 Bioinformatics 在线获得。