Huang Jinling, Bhandarkar Suchendra M
Department of Computer Science, University of Georgia, Athens, Georgia 30602-7404, USA.
Bioinformatics. 2003 Jul 22;19(11):1303-10. doi: 10.1093/bioinformatics/btg166.
Physical mapping of chromosomes using the maximum likelihood (ML) model is a problem of high computational complexity entailing both discrete optimization to recover the optimal probe order as well as continuous optimization to recover the optimal inter-probe spacings. In this paper, two versions of the genetic algorithm (GA) are proposed, one with heuristic crossover and deterministic replacement and the other with heuristic crossover and stochastic replacement, for the physical mapping problem under the maximum likelihood model. The genetic algorithms are compared with two other discrete optimization approaches, namely simulated annealing (SA) and large-step Markov chains (LSMC), in terms of solution quality and runtime efficiency.
The physical mapping algorithms based on the GA, SA and LSMC have been tested using synthetic datasets and real datasets derived from cosmid libraries of the fungus Neurospora crassa. The GA, especially the version with heuristic crossover and stochastic replacement, is shown to consistently outperform the SA-based and LSMC-based physical mapping algorithms in terms of runtime and final solution quality. Experimental results on real datasets and simulated datasets are presented. Further improvements to the GA in the context of physical mapping under the maximum likelihood model are proposed.
The software is available upon request from the first author.
使用最大似然(ML)模型对染色体进行物理图谱绘制是一个计算复杂度很高的问题,既需要离散优化来恢复最优探针顺序,也需要连续优化来恢复最优探针间距。本文针对最大似然模型下的物理图谱绘制问题,提出了两种遗传算法(GA)版本,一种采用启发式交叉和确定性替换,另一种采用启发式交叉和随机替换。在解的质量和运行效率方面,将遗传算法与另外两种离散优化方法,即模拟退火(SA)和大步长马尔可夫链(LSMC)进行了比较。
基于GA、SA和LSMC的物理图谱绘制算法已使用合成数据集和来自真菌粗糙脉孢菌黏粒文库的真实数据集进行了测试。结果表明,GA,尤其是采用启发式交叉和随机替换的版本,在运行时间和最终解的质量方面始终优于基于SA和LSMC的物理图谱绘制算法。给出了真实数据集和模拟数据集的实验结果。还提出了在最大似然模型下的物理图谱绘制背景下对GA的进一步改进。
可向第一作者索取该软件。