Schebera Jeremias, Zeckzer Dirk, Wiegreffe Daniel
Image and Signal Processing Group, Institute for Computer Science, Leipzig University, Leipzig, Germany.
Center for Scalable Data Analytics and Artificial Intelligence (ScaDS.AI) Dresden/Leipzig, Leipzig University, Leipzig, Germany.
Front Bioinform. 2024 Aug 16;4:1358374. doi: 10.3389/fbinf.2024.1358374. eCollection 2024.
Sequence alignments are often used to analyze genomic data. However, such alignments are often only calculated and compared on small sequence intervals for analysis purposes. When comparing longer sequences, these are usually divided into shorter sequence intervals for better alignment results. This usually means that the order context of the original sequence is lost. To prevent this, it is possible to use a graph structure to represent the order of the original sequence on the alignment blocks. The visualization of these graph structures can provide insights into the structural variations of genomes in a semi-global context. In this paper, we propose a new graph drawing framework for representing gMSA data. We produce a hierarchical graph layout that supports the comparative analysis of genomes. Based on a reference, the differences and similarities of the different genome orders are visualized. In this work, we present a complete graph drawing framework for gMSA graphs together with the respective algorithms for each of the steps. Additionally, we provide a prototype and an example data set for analyzing gMSA graphs. Based on this data set, we demonstrate the functionalities of the framework using two examples.
序列比对常用于分析基因组数据。然而,出于分析目的,此类比对通常仅在小序列区间上进行计算和比较。在比较较长序列时,通常会将其划分为较短的序列区间以获得更好的比对结果。这通常意味着原始序列的顺序上下文会丢失。为防止这种情况,可以使用图结构来表示比对块上原始序列的顺序。这些图结构的可视化可以在半全局背景下提供对基因组结构变异的见解。在本文中,我们提出了一种用于表示基因组多序列比对(gMSA)数据的新图绘制框架。我们生成了一种支持基因组比较分析的层次图布局。基于一个参考,不同基因组顺序的差异和相似性得以可视化。在这项工作中,我们为gMSA图展示了一个完整的图绘制框架以及每个步骤的相应算法。此外,我们提供了一个用于分析gMSA图的原型和一个示例数据集。基于这个数据集,我们通过两个示例展示了该框架的功能。