University of Science and Technology of China, Hefei, Anhui 230027, China.
IEEE/ACM Trans Comput Biol Bioinform. 2013 Mar-Apr;10(2):372-82. doi: 10.1109/TCBB.2013.18.
An enormous amount of sequence data has been generated with the development of new DNA sequencing technologies, which presents great challenges for computational biology problems such as haplotype phasing. Although arduous efforts have been made to address this problem, the current methods still cannot efficiently deal with the incoming flood of large-scale data. In this paper, we propose a flow network model to tackle haplotype phasing problem, and explain some classical haplotype phasing rules based on this model. By incorporating the heuristic knowledge obtained from these classical rules, we design an algorithm FNphasing based on the flow network model. Theoretically, the time complexity of our algorithm is (O(n(2)m+m(2)), which is better than that of 2SNP, one of the most efficient algorithms currently. After testing the performance of FNphasing with several simulated data sets, the experimental results show that when applied on large-scale data sets, our algorithm is significantly faster than the state-of-the-art Beagle algorithm. FNphasing also achieves an equal or superior accuracy compared with other approaches.
随着新的 DNA 测序技术的发展,产生了大量的序列数据,这给计算生物学问题(如单倍型相位)带来了巨大的挑战。尽管人们已经付出了艰苦的努力来解决这个问题,但目前的方法仍然不能有效地处理大规模数据的涌入。在本文中,我们提出了一种流网络模型来解决单倍型相位问题,并基于该模型解释了一些经典的单倍型相位规则。通过结合从这些经典规则中获得的启发式知识,我们设计了一种基于流网络模型的算法 FNphasing。从理论上讲,我们算法的时间复杂度为(O(n^2*m+m^2)),优于目前最有效的算法之一 2SNP。通过使用几个模拟数据集测试 FNphasing 的性能,实验结果表明,当应用于大规模数据集时,我们的算法比最先进的 Beagle 算法快得多。FNphasing 的准确性与其他方法相当或更优。