Animal Genomics, ETH Zurich, Universitaetstrasse 2, 8006, Zurich, Switzerland.
U.S. Meat Animal Research Center, USDA-ARS, 844 Road 313, Clay Center, NE, 68933, USA.
Nat Commun. 2022 May 31;13(1):3012. doi: 10.1038/s41467-022-30680-2.
Advantages of pangenomes over linear reference assemblies for genome research have recently been established. However, potential effects of sequence platform and assembly approach, or of combining assemblies created by different approaches, on pangenome construction have not been investigated. Here we generate haplotype-resolved assemblies from the offspring of three bovine trios representing increasing levels of heterozygosity that each demonstrate a substantial improvement in contiguity, completeness, and accuracy over the current Bos taurus reference genome. Diploid coverage as low as 20x for HiFi or 60x for ONT is sufficient to produce two haplotype-resolved assemblies meeting standards set by the Vertebrate Genomes Project. Structural variant-based pangenomes created from the haplotype-resolved assemblies demonstrate significant consensus regardless of sequence platform, assembler algorithm, or coverage. Inspecting pangenome topologies identifies 90 thousand structural variants including 931 overlapping with coding sequences; this approach reveals variants affecting QRICH2, PRDM9, HSPA1A, TAS2R46, and GC that have potential to affect phenotype.
最近已经证实,泛基因组相对于线性参考基因组在基因组研究方面具有优势。然而,序列平台和组装方法的潜在影响,或者不同方法创建的组装体的组合,对泛基因组构建的影响尚未得到研究。在这里,我们从代表杂合度逐渐增加的三个牛三联体的后代中生成了单倍型解析组装体,每个组装体在连续性、完整性和准确性方面都比当前的牛参考基因组有了实质性的提高。HiFi 低至 20x 或 ONT 低至 60x 的二倍体覆盖率足以生成两个符合脊椎动物基因组计划设定标准的单倍型解析组装体。来自单倍型解析组装体的基于结构变异的泛基因组无论序列平台、组装算法或覆盖率如何,都表现出显著的一致性。检查泛基因组拓扑结构可以识别出 9 万个结构变异,包括 931 个与编码序列重叠;这种方法揭示了影响 QRICH2、PRDM9、HSPA1A、TAS2R46 和 GC 的变体,这些变体有可能影响表型。