Jahn Katharina
AG Genominformatik, Technische Fakultät, Universität Bielefeld, Bielefeld, Germany.
J Comput Biol. 2011 Sep;18(9):1255-74. doi: 10.1089/cmb.2011.0132.
Whole genome comparison based on the analysis of gene cluster conservation has become a popular approach in comparative genomics. While gene order and gene content as a whole randomize over time, it is observed that certain groups of genes which are often functionally related remain co-located across species. However, the conservation is usually not perfect which turns the identification of these structures, often referred to as approximate gene clusters, into a challenging task. In this article, we present an efficient set distance based approach that computes approximate gene clusters by means of reference occurrences. We show that it yields highly comparable results to the corresponding non-reference based approach, while its polynomial runtime allows for approximate gene cluster detection in parameter ranges that used to be feasible only with simpler, e.g., max-gap based, gene cluster models. To illustrate further the performance and predictive power of our algorithm, we compare it to a state-of-the art approach for max-gap gene cluster computation.
基于基因簇保守性分析的全基因组比较已成为比较基因组学中的一种常用方法。虽然基因顺序和基因内容总体上会随时间随机化,但人们观察到某些通常功能相关的基因组在不同物种中仍保持共定位。然而,这种保守性通常并不完美,这使得识别这些结构(通常称为近似基因簇)成为一项具有挑战性的任务。在本文中,我们提出了一种基于高效集合距离的方法,该方法通过参考出现情况来计算近似基因簇。我们表明,它产生的结果与相应的基于非参考的方法具有高度可比性,同时其多项式运行时间允许在以前仅使用更简单的(例如基于最大间隙的)基因簇模型才可行的参数范围内检测近似基因簇。为了进一步说明我们算法的性能和预测能力,我们将其与一种用于最大间隙基因簇计算的先进方法进行了比较。