School of Computer Science, University of Windsor, Windsor, ON N9B 3P4, Canada.
Bioinformatics. 2020 Aug 1;36(15):4248-4254. doi: 10.1093/bioinformatics/btaa500.
One of the main challenges in applying graph convolutional neural networks (CNNs) on gene-interaction data is the lack of understanding of the vector space to which they belong, and also the inherent difficulties involved in representing those interactions on a significantly lower dimension, viz Euclidean spaces. The challenge becomes more prevalent when dealing with various types of heterogeneous data. We introduce a systematic, generalized method, called iSOM-GSN, used to transform 'multi-omic' data with higher dimensions onto a 2D grid. Afterwards, we apply a CNN to predict disease states of various types. Based on the idea of Kohonen's self-organizing map, we generate a 2D grid for each sample for a given set of genes that represent a gene similarity network.
We have tested the model to predict breast and prostate cancer using gene expression, DNA methylation and copy number alteration. Prediction accuracies in the 94-98% range were obtained for tumor stages of breast cancer and calculated Gleason scores of prostate cancer with just 14 input genes for both cases. The scheme not only outputs nearly perfect classification accuracy, but also provides an enhanced scheme for representation learning, visualization, dimensionality reduction and interpretation of multi-omic data.
The source code and sample data are available via a Github project at https://github.com/NaziaFatima/iSOM_GSN.
Supplementary data are available at Bioinformatics online.
将图卷积神经网络 (CNN) 应用于基因相互作用数据时面临的主要挑战之一是缺乏对其所属向量空间的理解,以及在低维(即欧几里得空间)上表示这些相互作用所固有的困难。当处理各种类型的异构数据时,挑战变得更加突出。我们引入了一种系统的、广义的方法,称为 iSOM-GSN,用于将具有更高维度的“多组学”数据转换到 2D 网格上。然后,我们应用 CNN 来预测各种类型的疾病状态。基于 Kohonen 的自组织映射的思想,我们为给定的一组代表基因相似性网络的基因生成每个样本的 2D 网格。
我们已经使用基因表达、DNA 甲基化和拷贝数改变来测试模型,以预测乳腺癌和前列腺癌。对于乳腺癌的肿瘤分期和前列腺癌的计算 Gleason 评分,仅使用两种情况下的 14 个输入基因,就获得了 94%-98%的预测准确率。该方案不仅输出近乎完美的分类准确率,而且还提供了一种增强的表示学习、可视化、降维和多组学数据解释方案。
源代码和示例数据可通过位于 https://github.com/NaziaFatima/iSOM_GSN 的 Github 项目获得。
补充数据可在 Bioinformatics 在线获得。