Department of Human Genetics and South Texas Diabetes and Obesity Institute, School of Medicine, University of Texas Rio Grande Valley, Brownsville, TX, USA.
Department of Biological Sciences, St. Mary's University, San Antonio, TX, USA.
Eur J Hum Genet. 2020 Jun;28(6):790-803. doi: 10.1038/s41431-020-0574-3. Epub 2020 Jan 29.
Phasing is the process of inferring haplotypes from genotype data. Efficient algorithms and associated software for accurate phasing in pedigrees are needed, especially for populations lacking reference panels of sequenced individuals. We present a novel method for phasing genotypes from whole-genome sequence data in pedigrees, called PULSAR (Phasing Using Lineage Specific Alleles/Rare variants). The method is based on the property that alleles specific to a single founding chromosome within a pedigree are highly informative for identifying haplotypes that are shared identical by descent. Simulation studies are used to assess the performance of PULSAR with various pedigree sizes and structures, and the effect of genotyping errors and the presence of nonsequenced individuals is investigated. In pedigrees with complete sequencing and realistic genotyping error rates, PULSAR correctly phases >99.9% of heterozygous genotypes, excluding sites at which all individuals are heterozygous, and does so with a switch error rate frequently below 10. PULSAR is highly accurate, capable of genotype error correction and imputation, and computationally competitive with alternative phasing software applicable to pedigrees. Our method has the significant advantage of not requiring reference panels that are essential for other population-based phasing algorithms. A software implementation of PULSAR is freely available.
相位是从基因型数据推断单倍型的过程。需要高效的算法和相关软件来准确地对家系进行相位划分,特别是对于缺乏测序个体参考面板的人群。我们提出了一种用于在家系中对全基因组序列数据进行基因分型的新方法,称为 PULSAR(使用谱系特异性等位基因/稀有变体进行相位划分)。该方法基于这样一个特性,即在一个家系中,特定于单个创始染色体的等位基因对于识别共享相同遗传起源的单倍型非常有信息量。通过模拟研究来评估 PULSAR 在不同家系大小和结构下的性能,并研究基因分型错误和未测序个体的存在的影响。在具有完整测序和现实基因分型错误率的家系中,PULSAR 正确地对 >99.9%的杂合基因型进行相位划分,排除了所有个体均为杂合的位点,并且其转换错误率通常低于 10。PULSAR 非常准确,能够进行基因型错误校正和插补,并且在适用于家系的替代相位划分软件中具有计算竞争力。我们的方法具有显著的优势,即不需要参考面板,这对于其他基于人群的相位划分算法是必不可少的。PULSAR 的软件实现是免费提供的。