Bayzid Md Shamsuzzoha, Alam Md Maksudul, Mueen Abdullah, Rahman Md Saidur
Department of Computer Science, University of Texas at Austin, Austin, TX 78712, USA.
Department of Computer Science, Virginia Tech, Blacksburg, VA 24060, USA.
ISRN Bioinform. 2013 Jan 28;2013:291741. doi: 10.1155/2013/291741. eCollection 2013.
Haplotype is a pattern of single nucleotide polymorphisms (SNPs) on a single chromosome. Constructing a pair of haplotypes from aligned and overlapping but intermixed and erroneous fragments of the chromosomal sequences is a nontrivial problem. Minimum error correction approach aims to minimize the number of errors to be corrected so that the pair of haplotypes can be constructed through consensus of the fragments. We give a heuristic algorithm (HMEC) that searches through alternative solutions using a gain measure and stops whenever no better solution can be achieved. Time complexity of each iteration is O(m (3) k) for an m × k SNP matrix where m and k are the number of fragments (number of rows) and number of SNP sites (number of columns), respectively, in an SNP matrix. Alternative gain measure is also given to reduce running time. We have compared our algorithm with other methods in terms of accuracy and running time on both simulated and real data, and our extensive experimental results indicate the superiority of our algorithm over others.
单倍型是指单条染色体上的单核苷酸多态性(SNP)模式。从染色体序列的比对、重叠但相互混合且存在错误的片段构建一对单倍型是一个具有挑战性的问题。最小错误校正方法旨在尽量减少需要校正的错误数量,以便通过片段的一致性构建单倍型对。我们给出一种启发式算法(HMEC),该算法使用增益度量在替代解决方案中进行搜索,并在无法获得更好的解决方案时停止。对于一个m×k的SNP矩阵,每次迭代的时间复杂度为O(m (3) k),其中m和k分别是SNP矩阵中片段的数量(行数)和SNP位点的数量(列数)。还给出了替代增益度量以减少运行时间。我们在模拟数据和真实数据上,从准确性和运行时间方面将我们的算法与其他方法进行了比较,我们广泛的实验结果表明我们的算法优于其他算法。