Department of Informatics, Systems and Communication, University of Milano - Bicocca. Viale Sarca 336, Milano 20126, Italy.
Department of Statistics and School of Life Sciences, University of Warwick, Coventry CV4 7AL, United Kingdom.
Bioinformatics. 2024 May 2;40(5). doi: 10.1093/bioinformatics/btae292.
Bacterial genomes present more variability than human genomes, which requires important adjustments in computational tools that are developed for human data. In particular, bacteria exhibit a mosaic structure due to homologous recombinations, but this fact is not sufficiently captured by standard read mappers that align against linear reference genomes. The recent introduction of pangenomics provides some insights in that context, as a pangenome graph can represent the variability within a species. However, the concept of sequence-to-graph alignment that captures the presence of recombinations has not been previously investigated.
In this paper, we present the extension of the notion of sequence-to-graph alignment to a variation graph that incorporates a recombination, so that the latter are explicitly represented and evaluated in an alignment. Moreover, we present a dynamic programming approach for the special case where there is at most a recombination-we implement this case as RecGraph. From a modelling point of view, a recombination corresponds to identifying a new path of the variation graph, where the new arc is composed of two halves, each extracted from an original path, possibly joined by a new arc. Our experiments show that RecGraph accurately aligns simulated recombinant bacterial sequences that have at most a recombination, providing evidence for the presence of recombination events.
Our implementation is open source and available at https://github.com/AlgoLab/RecGraph.
细菌基因组比人类基因组具有更多的可变性,这需要对为人类数据开发的计算工具进行重要调整。特别是,细菌由于同源重组而表现出镶嵌结构,但这一事实并没有被标准的读映射器充分捕捉到,这些读映射器是针对线性参考基因组进行对齐的。最近引入的泛基因组学在这方面提供了一些见解,因为泛基因组图可以表示物种内的可变性。然而,以前没有研究过捕获重组存在的序列到图对齐的概念。
在本文中,我们提出了将序列到图对齐的概念扩展到包含重组的变异图,以便明确表示重组并在对齐中评估重组。此外,我们提出了一种针对最多只有一个重组的特殊情况的动态规划方法——我们将其实现为 RecGraph。从建模的角度来看,重组对应于识别变异图的新路径,其中新弧由两个半弧组成,每个半弧都从原始路径中提取出来,可能通过一个新的弧连接。我们的实验表明,RecGraph 可以准确地对齐最多只有一个重组的模拟重组细菌序列,为重组事件的存在提供了证据。
我们的实现是开源的,可在 https://github.com/AlgoLab/RecGraph 上获得。