Hakobyan Siras, Stepanyan Ani, Nersisyan Lilit, Binder Hans, Arakelyan Arsen
Bioinformatics Group, Institute of Molecular Biology, Armenian National Academy of Sciences, Yerevan, Armenia.
Armenian Bioinformatics Institute (ABI), Yerevan, Armenia.
Front Genet. 2023 Aug 23;14:1264656. doi: 10.3389/fgene.2023.1264656. eCollection 2023.
Most high throughput genomic data analysis pipelines currently rely on over-representation or gene set enrichment analysis (ORA/GSEA) approaches for functional analysis. In contrast, topology-based pathway analysis methods, which offer a more biologically informed perspective by incorporating interaction and topology information, have remained underutilized and inaccessible due to various limiting factors. These methods heavily rely on the quality of pathway topologies and often utilize predefined topologies from databases without assessing their correctness. To address these issues and make topology-aware pathway analysis more accessible and flexible, we introduce the PSF (Pathway Signal Flow) toolkit R package. Our toolkit integrates pathway curation and topology-based analysis, providing interactive and command-line tools that facilitate pathway importation, correction, and modification from diverse sources. This enables users to perform topology-based pathway signal flow analysis in both interactive and command-line modes. To showcase the toolkit's usability, we curated 36 KEGG signaling pathways and conducted several use-case studies, comparing our method with ORA and the topology-based signaling pathway impact analysis (SPIA) method. The results demonstrate that the algorithm can effectively identify ORA enriched pathways while providing more detailed branch-level information. Moreover, in contrast to the SPIA method, it offers the advantage of being cut-off free and less susceptible to the variability caused by selection thresholds. By combining pathway curation and topology-based analysis, the PSF toolkit enhances the quality, flexibility, and accessibility of topology-aware pathway analysis. Researchers can now easily import pathways from various sources, correct and modify them as needed, and perform detailed topology-based pathway signal flow analysis. In summary, our PSF toolkit offers an integrated solution that addresses the limitations of current topology-based pathway analysis methods. By providing interactive and command-line tools for pathway curation and topology-based analysis, we empower researchers to conduct comprehensive pathway analyses across a wide range of applications.
目前,大多数高通量基因组数据分析流程在功能分析方面依赖于过度表达分析或基因集富集分析(ORA/GSEA)方法。相比之下,基于拓扑结构的通路分析方法通过整合相互作用和拓扑信息提供了更具生物学依据的视角,但由于各种限制因素,这些方法仍未得到充分利用且难以获取。这些方法严重依赖通路拓扑结构的质量,并且常常使用来自数据库的预定义拓扑结构,而不评估其正确性。为了解决这些问题,使基于拓扑结构的通路分析更易于获取和灵活应用,我们引入了PSF(通路信号流)工具包R包。我们的工具包整合了通路整理和基于拓扑结构的分析,提供交互式和命令行工具,便于从不同来源导入、校正和修改通路。这使得用户能够在交互式和命令行模式下进行基于拓扑结构的通路信号流分析。为了展示该工具包的可用性,我们整理了36条KEGG信号通路,并进行了多个案例研究,将我们的方法与ORA以及基于拓扑结构的信号通路影响分析(SPIA)方法进行比较。结果表明,该算法能够有效地识别ORA富集的通路,同时提供更详细的分支水平信息。此外,与SPIA方法相比,它具有无截止值的优势,并且对选择阈值引起的变异性不太敏感。通过将通路整理和基于拓扑结构的分析相结合,PSF工具包提高了基于拓扑结构的通路分析的质量、灵活性和可及性。研究人员现在可以轻松地从各种来源导入通路,根据需要进行校正和修改,并进行详细的基于拓扑结构的通路信号流分析。总之,我们的PSF工具包提供了一个综合解决方案,解决了当前基于拓扑结构的通路分析方法的局限性。通过提供用于通路整理和基于拓扑结构的分析的交互式和命令行工具,我们使研究人员能够在广泛的应用中进行全面的通路分析。