Sinha Amit U, Meller Jaroslaw
Department of Computer Science, University of Cincinnati, Cincinnati, OH 45221, USA.
Pac Symp Biocomput. 2008:37-48.
Identifying syntenic regions and quantifying evolutionary relatedness between genomes by interrogating genome rearrangement events is one of the central goals of comparative genomics. However, identification of synteny blocks and the resulting assessment of genome rearrangements are dependent on the choice of conserved markers, the definition of conserved segments, and the choice of various parameters that are used to construct such segments for two genomes. In this work, we performed an extended sensitivity analysis of synteny block generation using alternative sets of markers in multiple genomes. A simple approach to synteny block aggregation is used, which depends on two principle parameters: the maximum gap (max gap) between adjacent blocks to be merged, and the minimum length (min len) of synteny blocks. In particular, the dependence on the choice of conserved markers and max gap/min len aggregation parameters is assessed for two important quantities that can be used to characterize evolutionary relationships between genomes, namely the reversal distance and breakpoint reuse. We observe that the number of synteny blocks depends on both parameters, while the reversal distance depends mostly on min len. On the other hand, we observe that relative reversal distances between mammalian genomes, which are defined as ratios of distances between different pairs of genomes, are nearly constant for both parameters. Similarly, the breakpoint reuse rate was found to be almost constant for different data sets and a wide range of parameters. Breakpoint reuse is also strongly correlated with evolutionary distances, increasing for pairs of more divergent genomes. Finally, we demonstrate that the role of parameters may be further reduced by using a multi-way analysis that involves markers conserved in multiple genomes, which opens a way to guide the choice of a correct parameterization.
通过研究基因组重排事件来识别共线性区域并量化基因组之间的进化相关性是比较基因组学的核心目标之一。然而,共线性块的识别以及由此对基因组重排的评估取决于保守标记的选择、保守片段的定义以及用于构建两个基因组此类片段的各种参数的选择。在这项工作中,我们使用多个基因组中的替代标记集对共线性块生成进行了扩展的敏感性分析。我们采用了一种简单的共线性块聚合方法,该方法依赖于两个主要参数:要合并的相邻块之间的最大间隙(最大间隙)和共线性块的最小长度(最小长度)。特别是,针对可用于表征基因组之间进化关系的两个重要量,即反转距离和断点重用,评估了对保守标记选择以及最大间隙/最小长度聚合参数选择的依赖性。我们观察到共线性块的数量取决于这两个参数,而反转距离主要取决于最小长度。另一方面,我们观察到哺乳动物基因组之间的相对反转距离(定义为不同基因组对之间距离的比率)对于这两个参数几乎是恒定的。同样,发现断点重用率对于不同数据集和广泛的参数几乎是恒定的。断点重用也与进化距离密切相关,对于分歧更大的基因组对会增加。最后,我们证明通过使用涉及多个基因组中保守标记的多向分析,可以进一步降低参数的作用,这为指导正确参数化的选择开辟了一条途径。