Anaraki Maryam Pourkamali, Sadeghi Mehdi
Department of Computer Engineering, Science and Research Branch, Islamic Azad University, P.O. Box 14515/775, Tehran, Iran.
National Institute of Genetic Engineering and Biotechnology, P.O. Box 14965/161, Tehran, Iran; School of Biological Sciences, Institute for Research in Fundamental Sciences, Tehran, Iran.
Int J Comput Biol Drug Des. 2014;7(4):358-68. doi: 10.1504/IJCBDD.2014.066543. Epub 2014 Dec 25.
Availability of complete human genome is a crucial factor for genetic studies to explore possible association between the genome and complex diseases. Haplotype, as a set of single nucleotide polymorphisms (SNPs) on a single chromosome, is believed to contain promising data for disease association studies, detecting natural positive selection and recombination hotspots. Various computational methods for haplotype reconstruction from aligned fragment of SNPs have already been proposed. This study presents a novel approach to obtain paternal and maternal haplotypes form the SNP fragments on minimum error correction (MEC) model. Reconstructing haplotypes in MEC model is an NP-hard problem. Therefore, our proposed methods employ two fast and accurate clustering techniques as the core of their procedure to efficiently solve this ill-defined problem. The assessment of our approaches, compared to conventional methods, on two real benchmark datasets, i.e., ACE and DALY, proves the efficiency and accuracy.
完整人类基因组的可用性是基因研究探索基因组与复杂疾病之间可能关联的关键因素。单倍型作为单条染色体上的一组单核苷酸多态性(SNP),被认为包含用于疾病关联研究、检测自然正选择和重组热点的有价值数据。已经提出了各种从SNP比对片段重建单倍型的计算方法。本研究提出了一种基于最小错误校正(MEC)模型从SNP片段中获取父本和母本单倍型的新方法。在MEC模型中重建单倍型是一个NP难问题。因此,我们提出的方法采用两种快速准确的聚类技术作为其过程的核心,以有效解决这个定义不明确的问题。与传统方法相比,我们的方法在两个真实基准数据集即ACE和DALY上的评估证明了其效率和准确性。