Department of Computer Engineering, Ajou University, Suwon, 16499, South Korea.
Department of Molecular and Human Genetics, Baylor College of Medicine, Houston, TX, 77030, USA.
Biol Direct. 2019 Apr 29;14(1):8. doi: 10.1186/s13062-019-0239-8.
Integrating the rich information from multi-omics data has been a popular approach to survival prediction and bio-marker identification for several cancer studies. To facilitate the integrative analysis of multiple genomic profiles, several studies have suggested utilizing pathway information rather than using individual genomic profiles.
We have recently proposed an integrative directed random walk-based method utilizing pathway information (iDRW) for more robust and effective genomic feature extraction. In this study, we applied iDRW to multiple genomic profiles for two different cancers, and designed a directed gene-gene graph which reflects the interaction between gene expression and copy number data. In the experiments, the performances of the iDRW method and four state-of-the-art pathway-based methods were compared using a survival prediction model which classifies samples into two survival groups.
The results show that the integrative analysis guided by pathway information not only improves prediction performance, but also provides better biological insights into the top pathways and genes prioritized by the model in both the neuroblastoma and the breast cancer datasets. The pathways and genes selected by the iDRW method were shown to be related to the corresponding cancers.
In this study, we demonstrated the effectiveness of a directed random walk-based multi-omics data integration method applied to gene expression and copy number data for both breast cancer and neuroblastoma datasets. We revamped a directed gene-gene graph considering the impact of copy number variation on gene expression and redefined the weight initialization and gene-scoring method. The benchmark result for iDRW with four pathway-based methods demonstrated that the iDRW method improved survival prediction performance and jointly identified cancer-related pathways and genes for two different cancer datasets.
This article was reviewed by Helena Molina-Abril and Marta Hidalgo.
整合来自多组学数据的丰富信息已成为几种癌症研究中用于生存预测和生物标志物识别的一种流行方法。为了促进多个基因组谱的综合分析,几项研究建议利用途径信息而不是使用单个基因组谱。
我们最近提出了一种利用途径信息的综合有向随机游走方法(iDRW),用于更稳健和有效的基因组特征提取。在这项研究中,我们将 iDRW 应用于两种不同癌症的多个基因组谱,并设计了一个有向基因-基因图,反映基因表达和拷贝数数据之间的相互作用。在实验中,使用生存预测模型将样本分为两个生存组,比较了 iDRW 方法和四种最先进的基于途径的方法的性能。
结果表明,基于途径信息的综合分析不仅提高了预测性能,而且为模型优先考虑的前几条途径和基因提供了更好的生物学见解,这在神经母细胞瘤和乳腺癌数据集中均有体现。iDRW 方法选择的途径和基因与相应的癌症有关。
在这项研究中,我们展示了一种有向随机游走多组学数据集成方法在乳腺癌和神经母细胞瘤数据集中的基因表达和拷贝数数据的有效性。我们重新构建了一个有向基因-基因图,考虑了拷贝数变异对基因表达的影响,并重新定义了权重初始化和基因评分方法。与四种基于途径的方法的 iDRW 基准测试结果表明,iDRW 方法提高了生存预测性能,并共同鉴定了两种不同癌症数据集的癌症相关途径和基因。
本文由 Helena Molina-Abril 和 Marta Hidalgo 进行了评审。