Faraut T, de Givry S, Chabrier P, Derrien T, Galibert F, Hitte C, Schiex T
Laboratoire de génétique cellulaire BP 52627, 31326 Castanet Tolosan, France.
Bioinformatics. 2007 Jan 15;23(2):e50-6. doi: 10.1093/bioinformatics/btl321.
Genome maps are fundamental to the study of an organism and essential in the process of genome sequencing which in turn provides the ultimate map of the genome. The increased number of genomes being sequenced offers new opportunities for the mapping of closely related organisms. We propose here an algorithmic formalization of a genome comparison approach to marker ordering.
In order to integrate a comparative mapping approach in the algorithmic process of map construction and selection, we propose to extend the usual statistical model describing the experimental data, here radiation hybrids (RH) data, in a statistical framework that models additionally the evolutionary relationships between a proposed map and a reference map: an existing map of the corresponding orthologous genes or markers in a closely related organism. This has concretely the effect of exploiting, in the process of map selection, the information of marker adjacencies in the related genome when the information provided by the experimental data is not conclusive for the purpose of ordering. In order to compute efficiently the map, we proceed to a reduction of the maximum likelihood estimation to the Traveling Salesman Problem. Experiments on simulated RH datasets as well as on a real RH dataset from the canine RH project show that maps produced using the likelihood defined by the new model are significantly better than maps built using the traditional RH model.
The comparative mapping approach is available in the last version of de Givry,S. et al. [(2004) Bioinformatics, 21, 1703-1704, www.inra.fr/mia/T/CarthaGene], a free (the LKH part is free for academic use only) mapping software in C++, including LKH (Helsgaun,K. (2000) Eur. J. Oper. Res., 126, 106-130, www.dat.ruc.dk/keld/research/LKH) for maximum likelihood computation.
基因组图谱是生物体研究的基础,在基因组测序过程中至关重要,而基因组测序反过来又提供了基因组的最终图谱。越来越多的基因组被测序,为密切相关生物体的图谱绘制提供了新机会。我们在此提出一种用于标记排序的基因组比较方法的算法形式化。
为了在图谱构建和选择的算法过程中整合比较图谱方法,我们提议在一个统计框架中扩展描述实验数据(此处为辐射杂种(RH)数据)的常用统计模型,该框架还对提议图谱与参考图谱之间的进化关系进行建模:参考图谱是密切相关生物体中相应直系同源基因或标记的现有图谱。具体而言,这在图谱选择过程中,当实验数据提供的信息对于排序目的而言不具有决定性时,能够利用相关基因组中标记邻接的信息。为了高效计算图谱,我们将最大似然估计简化为旅行商问题。对模拟的RH数据集以及犬类RH项目的真实RH数据集进行的实验表明,使用新模型定义的似然性生成的图谱明显优于使用传统RH模型构建的图谱。
比较图谱方法可在de Givry, S.等人[(2004) Bioinformatics, 21, 1703 - 1704, www.inra.fr/mia/T/CarthaGene]的最新版本中获取,这是一个免费的(LKH部分仅供学术使用免费)用C++编写的图谱绘制软件,包括用于最大似然计算的LKH(Helsgaun, K. (2000) Eur. J. Oper. Res., 126, 106 - 130, www.dat.ruc.dk/keld/research/LKH)。