Oğuztüzün Çerağ, Gao Zhenxiang, Xu Rong
Case Western Reserve University, Cleveland, Ohio, USA.
bioRxiv. 2024 Dec 6:2024.12.03.626631. doi: 10.1101/2024.12.03.626631.
One of the primary challenges in biomedical research is the interpretation of complex genomic relationships and the prediction of functional interactions across the genome. Tokenvizz is a novel tool for genomic analysis that enhances data discovery and visualization by combining GraphRAG-inspired tokenization with graph-based modeling. In Tokenvizz, genomic sequences are represented as graphs, where sequence k-mers (tokens) serve as nodes and attention scores as edge weights, enabling researchers to visually interpret complex, non-linear relationships within DNA sequences. Through a web-based visualization interface, researchers can interactively explore these genomic relationships and extract biologically meaningful insights about regulatory patterns and functional elements. Applied to promoter-enhancer interaction prediction tasks, Tokenvizz outperformed traditional sequential models while providing interpretable insights into genomic features, demonstrating the advantage of graph-based representations for biological discovery.
生物医学研究的主要挑战之一是对复杂基因组关系的解释以及全基因组功能相互作用的预测。Tokenvizz是一种用于基因组分析的新型工具,它通过将受GraphRAG启发的词元化与基于图的建模相结合,增强了数据发现和可视化功能。在Tokenvizz中,基因组序列被表示为图,其中序列k-mer(词元)作为节点,注意力分数作为边权重,使研究人员能够直观地解释DNA序列内复杂的非线性关系。通过基于网络的可视化界面,研究人员可以交互式地探索这些基因组关系,并提取有关调控模式和功能元件的生物学有意义的见解。应用于启动子-增强子相互作用预测任务时,Tokenvizz优于传统的序列模型,同时为基因组特征提供了可解释的见解,证明了基于图的表示在生物学发现中的优势。