Li Lei M, Kim Jong Hyun, Waterman Michael S
Molecular and Computational Biology Program, Department of Biological Sciences, University of Southern California, Los Angeles, CA 90089, USA.
J Comput Biol. 2004;11(2-3):505-16. doi: 10.1089/1066527041410454.
In this paper, we describe a method for statistical reconstruction of haplotypes from a set of aligned SNP fragments. We consider the case of a pair of homologous human chromosomes, one from the mother and the other from the father. After fragment assembly, we wish to reconstruct the two haplotypes of the parents. Given a set of potential SNP sites inferred from the assembly alignment, we wish to divide the fragment set into two subsets, each of which represents one chromosome. Our method is based on a statistical model of sequencing errors, compositional information, and haplotype memberships. We calculate probabilities of different haplotypes conditional on the alignment. Due to computational complexity, we first determine phases for neighboring SNPs. Then we connect them and construct haplotype segments. Also, we compute the accuracy or confidence of the reconstructed haplotypes. We discuss other issues, such as alternative methods, parameter estimation, computational efficiency, and relaxation of assumptions.
在本文中,我们描述了一种从一组比对的单核苷酸多态性(SNP)片段进行单倍型统计重建的方法。我们考虑一对同源人类染色体的情况,一条来自母亲,另一条来自父亲。在片段组装之后,我们希望重建父母的两个单倍型。给定从组装比对中推断出的一组潜在SNP位点,我们希望将片段集划分为两个子集,每个子集代表一条染色体。我们的方法基于测序错误、组成信息和单倍型成员关系的统计模型。我们根据比对计算不同单倍型的概率。由于计算复杂性,我们首先确定相邻SNP的相位。然后将它们连接起来并构建单倍型片段。此外,我们计算重建单倍型的准确性或置信度。我们还讨论了其他问题,如替代方法、参数估计、计算效率以及假设的放宽。