Inner Mongolia University, Hohhot, China.
BGI Genomics, Shenzhen, China.
Sci Data. 2024 Jul 11;11(1):762. doi: 10.1038/s41597-024-03581-w.
Advancements in sequencing have enabled the assembly of numerous sheep genomes, significantly advancing our understanding of the link between genetic variation and phenotypic traits. However, the genome of East Friesian sheep (Ostfriesisches Milchschaf), a key high-yield milk breed, remains to be fully assembled. Here, we constructed a near-complete and gap-free East Friesian genome assembly using PacBio HiFi, ultra-long ONT and Hi-C sequencing. The resulting genome assembly spans approximately 2.96 Gb, with a contig N50 length of 104.1 Mb and only 164 unplaced sequences. Remarkably, our assembly has captured 41 telomeres and 24 centromeres. The assembled sequence is of high quality on completeness (BUSCO score: 97.1%) and correctness (QV: 69.1). In addition, a total of 24,580 protein-coding genes were predicted, of which 97.2% (23,891) carried at least one conserved functional domain. Collectively, this assembly provides not only a near T2T gap-free genome, but also provides a valuable genetic resource for comparative genome studies of sheep and will serve as an important tool for the sheep research community.
测序技术的进步使得大量绵羊基因组得以组装,极大地促进了我们对遗传变异与表型特征之间关系的理解。然而,东弗里生羊(Ostfriesisches Milchschaf)的基因组,作为一种重要的高产奶品种,仍有待完全组装。在这里,我们使用 PacBio HiFi、超长 ONT 和 Hi-C 测序技术构建了一个近乎完整、无间隙的东弗里生羊基因组组装。该基因组组装约 29.6 亿碱基对,其 contig N50 长度为 104.1Mb,仅有 164 个未定位序列。值得注意的是,我们的组装成功捕获了 41 个端粒和 24 个着丝粒。该组装序列具有很高的完整性(BUSCO 评分:97.1%)和正确性(QV:69.1%)。此外,共预测了 24580 个蛋白质编码基因,其中 97.2%(23891 个)至少携带一个保守的功能域。总的来说,该组装不仅提供了一个近乎无间隙的 T2T 基因组,还为绵羊的比较基因组研究提供了宝贵的遗传资源,并将成为绵羊研究界的重要工具。