Department of Information Systems, Zefat Academic College, 13206, Zefat, Israel.
Galilee Digital Health Research Center (GDH), Zefat Academic College, Zefat, Israel.
BMC Bioinformatics. 2023 Feb 23;24(1):60. doi: 10.1186/s12859-023-05187-2.
Cell homeostasis relies on the concerted actions of genes, and dysregulated genes can lead to diseases. In living organisms, genes or their products do not act alone but within networks. Subsets of these networks can be viewed as modules that provide specific functionality to an organism. The Kyoto encyclopedia of genes and genomes (KEGG) systematically analyzes gene functions, proteins, and molecules and combines them into pathways. Measurements of gene expression (e.g., RNA-seq data) can be mapped to KEGG pathways to determine which modules are affected or dysregulated in the disease. However, genes acting in multiple pathways and other inherent issues complicate such analyses. Many current approaches may only employ gene expression data and need to pay more attention to some of the existing knowledge stored in KEGG pathways for detecting dysregulated pathways. New methods that consider more precompiled information are required for a more holistic association between gene expression and diseases.
PriPath is a novel approach that transfers the generic process of grouping and scoring, followed by modeling to analyze gene expression with KEGG pathways. In PriPath, KEGG pathways are utilized as the grouping function as part of a machine learning algorithm for selecting the most significant KEGG pathways. A machine learning model is trained to differentiate between diseases and controls using those groups. We have tested PriPath on 13 gene expression datasets of various cancers and other diseases. Our proposed approach successfully assigned biologically and clinically relevant KEGG terms to the samples based on the differentially expressed genes. We have comparatively evaluated the performance of PriPath against other tools, which are similar in their merit. For each dataset, we manually confirmed the top results of PriPath in the literature and found that most predictions can be supported by previous experimental research.
PriPath can thus aid in determining dysregulated pathways, which applies to medical diagnostics. In the future, we aim to advance this approach so that it can perform patient stratification based on gene expression and identify druggable targets. Thereby, we cover two aspects of precision medicine.
细胞内稳态依赖于基因的协同作用,失调的基因可能导致疾病。在活生物体中,基因或其产物并非单独起作用,而是在网络中起作用。这些网络的子集可以看作是为生物体提供特定功能的模块。京都基因与基因组百科全书(KEGG)系统地分析基因功能、蛋白质和分子,并将它们组合成途径。基因表达的测量(例如 RNA-seq 数据)可以映射到 KEGG 途径,以确定疾病中哪些模块受到影响或失调。然而,作用于多个途径的基因和其他内在问题使此类分析变得复杂。许多当前的方法可能仅使用基因表达数据,需要更多地关注 KEGG 途径中存储的一些现有知识,以检测失调的途径。需要新的方法来考虑更多预先编译的信息,以便更全面地将基因表达与疾病联系起来。
PriPath 是一种新颖的方法,它将分组和评分的通用过程以及建模转移到使用 KEGG 途径分析基因表达的过程中。在 PriPath 中,KEGG 途径被用作分组功能,作为机器学习算法的一部分,用于选择最重要的 KEGG 途径。使用这些组训练机器学习模型来区分疾病和对照。我们已经在各种癌症和其他疾病的 13 个基因表达数据集上测试了 PriPath。我们的方法根据差异表达基因成功地将具有生物学和临床意义的 KEGG 术语分配给样本。我们比较了 PriPath 与其他类似工具的性能,这些工具在功能上相似。对于每个数据集,我们都在文献中手动确认了 PriPath 的顶级结果,并发现大多数预测都可以得到先前实验研究的支持。
因此,PriPath 可以帮助确定失调的途径,这适用于医学诊断。在未来,我们的目标是推进这一方法,以便能够根据基因表达对患者进行分层,并识别可用药的靶点。从而,我们涵盖了精准医学的两个方面。