Department of Botany and Plant Sciences, University of California, Riverside, Riverside, CA.
Graduate Program in Genetics, Genomics, and Bioinformatics, University of California, Riverside, Riverside, CA.
Mol Biol Evol. 2020 Dec 16;37(12):3684-3698. doi: 10.1093/molbev/msaa176.
Compared with genomic data of individual markers, haplotype data provide higher resolution for DNA variants, advancing our knowledge in genetics and evolution. Although many computational and experimental phasing methods have been developed for analyzing diploid genomes, it remains challenging to reconstruct chromosome-scale haplotypes at low cost, which constrains the utility of this valuable genetic resource. Gamete cells, the natural packaging of haploid complements, are ideal materials for phasing entire chromosomes because the majority of the haplotypic allele combinations has been preserved. Therefore, compared with the current diploid-based phasing methods, using haploid genomic data of single gametes may substantially reduce the complexity in inferring the donor's chromosomal haplotypes. In this study, we developed the first easy-to-use R package, Hapi, for inferring chromosome-length haplotypes of individual diploid genomes with only a few gametes. Hapi outperformed other phasing methods when analyzing both simulated and real single gamete cell sequencing data sets. The results also suggested that chromosome-scale haplotypes may be inferred by using as few as three gametes, which has pushed the boundary to its possible limit. The single gamete cell sequencing technology allied with the cost-effective Hapi method will make large-scale haplotype-based genetic studies feasible and affordable, promoting the use of haplotype data in a wide range of research.
与个体标记的基因组数据相比,单倍型数据为 DNA 变体提供了更高的分辨率,从而增进了我们在遗传学和进化方面的认识。虽然已经开发出许多用于分析二倍体基因组的计算和实验相位方法,但以低成本重建染色体规模的单倍型仍然具有挑战性,这限制了这种有价值遗传资源的应用。配子细胞是单倍体互补物的自然包装,是对整个染色体进行相位分析的理想材料,因为大多数单倍型等位基因组合都得以保留。因此,与当前基于二倍体的相位方法相比,使用单个配子的单倍体基因组数据可以大大降低推断供体染色体单倍型的复杂性。在这项研究中,我们开发了第一个易于使用的 R 包 Hapi,用于仅使用少数几个配子来推断个体二倍体基因组的染色体长度单倍型。在分析模拟和真实的单个配子细胞测序数据集时,Hapi 的表现均优于其他相位方法。结果还表明,通过使用多达三个配子就可以推断染色体规模的单倍型,这已经将其推到了可能的极限。与具有成本效益的 Hapi 方法相结合的单个配子细胞测序技术将使基于单倍型的大规模遗传研究成为可能且负担得起,从而促进了在广泛的研究中使用单倍型数据。