Alexeev Nikita, Pologova Anna, Alekseyev Max A
1 The George Washington University , Washington, District of Columbia.
2 St. Petersburg State University , St. Petersburg, Russia .
J Comput Biol. 2017 Feb;24(2):93-105. doi: 10.1089/cmb.2016.0190. Epub 2017 Jan 3.
Genome rearrangements can be modeled as k-breaks, which break a genome at k positions and glue the resulting fragments in a new order. In particular, reversals, translocations, fusions, and fissions are modeled as 2-breaks, and transpositions are modeled as 3-breaks. Although k-break rearrangements for [Formula: see text] have not been observed in evolution, they are used in cancer genomics to model chromothripsis, a catastrophic event of multiple breakages happening simultaneously in a genome. It is known that the k-break distance between two genomes (i.e., the minimum number of k-breaks required to transform one genome into the other) can be computed in terms of cycle lengths in the breakpoint graph of these genomes. In this work, we address the combinatorial problem of enumerating genomes at a given k-break distance from a fixed unichromosomal genome. More generally, we enumerate genome pairs, whose breakpoint graph has a given distribution of cycle lengths. We further show how our enumeration can be used for uniform sampling of random genomes at a given k-break distance, and describe its connection to various combinatorial objects such as Bell polynomials.
基因组重排可以建模为k断点,即在基因组的k个位置处进行断裂,并将产生的片段以新的顺序重新拼接。特别地,反转、易位、融合和裂变被建模为2断点,而转座被建模为3断点。尽管在进化过程中尚未观察到[公式:见文本]的k断点重排,但它们在癌症基因组学中被用于模拟染色体碎裂,即基因组中同时发生多次断裂的灾难性事件。已知两个基因组之间的k断点距离(即将一个基因组转化为另一个基因组所需的最小k断点数量)可以根据这些基因组的断点图中的循环长度来计算。在这项工作中,我们解决了一个组合问题,即枚举与固定单染色体基因组具有给定k断点距离的基因组。更一般地,我们枚举断点图具有给定循环长度分布的基因组对。我们进一步展示了如何将我们的枚举用于在给定k断点距离下对随机基因组进行均匀采样,并描述了它与各种组合对象(如贝尔多项式)的联系。