Phase Genomics, Seattle, WA, USA.
Pacific Biosciences, Menlo Park, CA, USA.
Nat Commun. 2021 Apr 28;12(1):1935. doi: 10.1038/s41467-020-20536-y.
Haplotype-resolved genome assemblies are important for understanding how combinations of variants impact phenotypes. To date, these assemblies have been best created with complex protocols, such as cultured cells that contain a single-haplotype (haploid) genome, single cells where haplotypes are separated, or co-sequencing of parental genomes in a trio-based approach. These approaches are impractical in most situations. To address this issue, we present FALCON-Phase, a phasing tool that uses ultra-long-range Hi-C chromatin interaction data to extend phase blocks of partially-phased diploid assembles to chromosome or scaffold scale. FALCON-Phase uses the inherent phasing information in Hi-C reads, skipping variant calling, and reduces the computational complexity of phasing. Our method is validated on three benchmark datasets generated as part of the Vertebrate Genomes Project (VGP), including human, cow, and zebra finch, for which high-quality, fully haplotype-resolved assemblies are available using the trio-based approach. FALCON-Phase is accurate without having parental data and performance is better in samples with higher heterozygosity. For cow and zebra finch the accuracy is 97% compared to 80-91% for human. FALCON-Phase is applicable to any draft assembly that contains long primary contigs and phased associate contigs.
单体型解析基因组组装对于理解变异组合如何影响表型至关重要。迄今为止,这些组装最好通过复杂的方案来创建,例如含有单倍型(单倍体)基因组的培养细胞、分离单倍型的单细胞,或基于三亲的方法对亲本基因组进行共测序。在大多数情况下,这些方法都不切实际。为了解决这个问题,我们提出了 FALCON-Phase,这是一种相位工具,它使用超长距离 Hi-C 染色质相互作用数据将部分相位的二倍体组装的相位块扩展到染色体或支架规模。FALCON-Phase 利用 Hi-C 读取中的固有相位信息,跳过变异调用,并降低了相位的计算复杂度。我们的方法在三个基准数据集上进行了验证,这些数据集是作为脊椎动物基因组计划 (VGP) 的一部分生成的,包括人类、牛和斑胸草雀,对于这些物种,使用基于三亲的方法可以获得高质量、完全单体型解析的组装。FALCON-Phase 在没有亲本数据的情况下也很准确,并且在杂合度更高的样本中性能更好。对于牛和斑胸草雀,准确性为 97%,而人类为 80-91%。FALCON-Phase 适用于任何包含长原始 contigs 和相位关联 contigs 的草案组装。