Lisiecka Anna, Dojer Norbert
Institute of Informatics, University of Warsaw, Banacha 2, 02-097 Warsaw, Poland.
iScience. 2021 Jun 19;24(7):102755. doi: 10.1016/j.isci.2021.102755. eCollection 2021 Jul 23.
The need to include the genetic variation within a population into a reference genome led to the concept of a genome sequence graph. Nodes of such a graph are labeled with DNA sequences occurring in represented genomes. Due to double-stranded nature of DNA, each node may be oriented in one of two possible ways, resulting in marking one end of the labeling sequence as in-side and the other as out-side. Edges join pairs of sides and reflect adjacency between node sequences in genomes constituting the graph. Linearization of a sequence graph aims at orienting and ordering graph nodes in a way that makes it more efficient for visualization and further analysis, e.g. access and traversal. We propose a new linearization algorithm, called ALIBI - Algorithm for Linearization by Incremental graph BuIlding. The evaluation shows that ALIBI is computationally very efficient and generates high-quality results.
将群体内的遗传变异纳入参考基因组的需求催生了基因组序列图的概念。这种图的节点用所代表基因组中出现的DNA序列进行标记。由于DNA的双链性质,每个节点可能以两种可能的方式之一进行定向,从而将标记序列的一端标记为内侧,另一端标记为外侧。边连接两侧的对,并反映构成该图的基因组中节点序列之间的邻接关系。序列图的线性化旨在以一种使其更便于可视化和进一步分析(例如访问和遍历)的方式对图节点进行定向和排序。我们提出了一种新的线性化算法,称为ALIBI——通过增量图构建进行线性化的算法。评估表明,ALIBI在计算上非常高效,并能产生高质量的结果。