Castaneda Everest Uriel, Baker Erich J
Department of Biology, Baylor University, Waco, TX, United States.
School of Engineering and Computer Science, Baylor University, Waco, TX, United States.
Front Genet. 2024 Feb 13;15:1292394. doi: 10.3389/fgene.2024.1292394. eCollection 2024.
Automating the recreation of gene and mixed gene-compound networks from Kyoto Encyclopedia of Genes and Genomes (KEGG) Markup Language (KGML) files is challenging because the data structure does not preserve the independent or loosely connected neighborhoods in which they were originally derived, referred to here as its topological environment. Identical accession numbers may overlap, causing neighborhoods to artificially collapse based on duplicated identifiers. This causes current parsers to create misleading or erroneous graphical representations when mixed gene networks are converted to gene-only networks. To overcome these challenges we created a python-based KEGG NetworkX Topological (KNeXT) parser that allows users to accurately recapitulate genetic networks and mixed networks from KGML map data. The software, archived as a python package index (PyPI) file to ensure broad application, is designed to ingest KGML files through built-in APIs and dynamically create high-fidelity topological representations. The utilization of NetworkX's framework to generate tab-separated files additionally ensures that KNeXT results may be imported into other graph frameworks and maintain programmatic access to the original - axis positions to each node in the KEGG pathway. KNeXT is a well-described Python 3 package that allows users to rapidly download and aggregate specific KGML files and recreate KEGG pathways based on a range of user-defined settings. KNeXT is platform-independent, distinctive, and it is not written on top of other Python parsers. Furthermore, KNeXT enables users to parse entire local folders or single files through command line scripts and convert the output into NCBI or UniProt IDs. KNeXT provides an ability for researchers to generate pathway visualizations while persevering the original context of a KEGG pathway. Source code is freely available at https://github.com/everest-castaneda/knext.
从京都基因与基因组百科全书(KEGG)标记语言(KGML)文件中自动重建基因网络和混合基因-化合物网络具有挑战性,因为数据结构无法保留其最初派生时的独立或松散连接的邻域,在此称为其拓扑环境。相同的登录号可能会重叠,导致邻域基于重复的标识符而人为地合并。这使得当前的解析器在将混合基因网络转换为仅基因网络时会创建误导性或错误的图形表示。为了克服这些挑战,我们创建了一个基于Python的KEGG NetworkX拓扑(KNeXT)解析器,该解析器允许用户从KGML图谱数据中准确地重现遗传网络和混合网络。该软件作为Python包索引(PyPI)文件存档以确保广泛应用,旨在通过内置API摄取KGML文件并动态创建高保真拓扑表示。利用NetworkX框架生成制表符分隔的文件还可确保KNeXT结果可以导入到其他图形框架中,并保持对KEGG通路中每个节点的原始轴位置的编程访问。KNeXT是一个描述详尽的Python 3包,允许用户快速下载和汇总特定的KGML文件,并根据一系列用户定义的设置重新创建KEGG通路。KNeXT是独立于平台的,具有独特性,并且不是在其他Python解析器之上编写的。此外,KNeXT允许用户通过命令行脚本解析整个本地文件夹或单个文件,并将输出转换为NCBI或UniProt ID。KNeXT使研究人员能够生成通路可视化,同时保留KEGG通路的原始上下文。源代码可在https://github.com/everest-castaneda/knext上免费获得。