Patil Shruti S, Roberts Steven A, Gebremedhin Assefaw H
School of Electrical Engineering and Computer Science, Washington State University, Pullman, WA, United States.
School of Molecular Biosciences, Washington State University, Pullman, WA, United States.
Front Bioinform. 2024 Jul 8;4:1365200. doi: 10.3389/fbinf.2024.1365200. eCollection 2024.
Cancer is a heterogeneous disease that results from genetic alteration of cell cycle and proliferation controls. Identifying mutations that drive cancer, understanding cancer type specificities, and delineating how driver mutations interact with each other to establish disease is vital for identifying therapeutic vulnerabilities. Such cancer specific patterns and gene co-occurrences can be identified by studying tumor genome sequences, and networks have proven effective in uncovering relationships between sequences. We present two network-based approaches to identify driver gene patterns among tumor samples. The first approach relies on analysis using the Directed Weighted All Nearest Neighbors (DiWANN) model, which is a variant of sequence similarity network, and the second approach uses bipartite network analysis. A data reduction framework was implemented to extract the minimal relevant information for the sequence similarity network analysis, where a transformed reference sequence is generated for constructing the driver gene network. This data reduction process combined with the efficiency of the DiWANN network model, greatly lowered the computational cost (in terms of execution time and memory usage) of generating the networks enabling us to work at a much larger scale than previously possible. The DiWANN network helped us identify cancer types in which samples were more closely connected to each other suggesting they are less heterogeneous and potentially susceptible to a common drug. The bipartite network analysis provided insight into gene associations and co-occurrences. We identified genes that were broadly mutated in multiple cancer types and mutations exclusive to only a few. Additionally, weighted one-mode gene projections of the bipartite networks revealed a pattern of occurrence of driver genes in different cancers. Our study demonstrates that network-based approaches can be an effective tool in cancer genomics. The analysis identifies co-occurring and exclusive driver genes and mutations for specific cancer types, providing a better understanding of the driver genes that lead to tumor initiation and evolution.
癌症是一种异质性疾病,由细胞周期和增殖控制的基因改变引起。识别驱动癌症的突变、了解癌症类型特异性以及描绘驱动突变如何相互作用以引发疾病,对于确定治疗靶点至关重要。通过研究肿瘤基因组序列可以识别此类癌症特异性模式和基因共现情况,而网络已被证明在揭示序列之间的关系方面是有效的。我们提出了两种基于网络的方法来识别肿瘤样本中的驱动基因模式。第一种方法依赖于使用定向加权全最近邻(DiWANN)模型进行分析,该模型是序列相似性网络的一种变体,第二种方法使用二分网络分析。实施了一个数据简化框架,以提取序列相似性网络分析所需的最小相关信息,其中生成一个转换后的参考序列用于构建驱动基因网络。这种数据简化过程与DiWANN网络模型的效率相结合,大大降低了生成网络的计算成本(在执行时间和内存使用方面),使我们能够在比以前更大的规模上开展工作。DiWANN网络帮助我们识别出样本之间联系更紧密的癌症类型,这表明它们的异质性较低,可能对一种共同药物敏感。二分网络分析提供了对基因关联和共现情况的见解。我们识别出在多种癌症类型中广泛突变的基因以及仅在少数癌症类型中出现的独特突变。此外,二分网络的加权单模基因投影揭示了驱动基因在不同癌症中的出现模式。我们的研究表明,基于网络的方法可以成为癌症基因组学中的有效工具。该分析识别出特定癌症类型中共现和独特的驱动基因及突变,有助于更好地理解导致肿瘤发生和演变的驱动基因。