Vetsuisse Faculty, Institute of Genetics, University of Bern, 3001 Bern, Switzerland.
Institute Genetics Development Rennes, University of Rennes, CNRS-UMR 6290, F-35000 Rennes, France.
Genes (Basel). 2021 May 30;12(6):847. doi: 10.3390/genes12060847.
The domestic dog has evolved to be an important biomedical model for studies regarding the genetic basis of disease, morphology and behavior. Genetic studies in the dog have relied on a draft reference genome of a purebred female boxer dog named "Tasha" initially published in 2005. Derived from a Sanger whole genome shotgun sequencing approach coupled with limited clone-based sequencing, the initial assembly and subsequent updates have served as the predominant resource for canine genetics for 15 years. While the initial assembly produced a good-quality draft, as with all assemblies produced at the time, it contained gaps, assembly errors and missing sequences, particularly in GC-rich regions, which are found at many promoters and in the first exons of protein-coding genes. Here, we present Dog10K_Boxer_Tasha_1.0, an improved chromosome-level highly contiguous genome assembly of Tasha created with long-read technologies that increases sequence contiguity >100-fold, closes >23,000 gaps of the CanFam3.1 reference assembly and improves gene annotation by identifying >1200 new protein-coding transcripts. The assembly and annotation are available at NCBI under the accession GCF_000002285.5.
家犬已经进化成为研究疾病遗传基础、形态和行为的重要生物医学模型。犬类的遗传研究依赖于最初于 2005 年公布的一只名为“塔莎”的纯种雌性拳师犬的草图参考基因组。该基因组源自桑格全基因组鸟枪法测序方法与有限的基于克隆的测序相结合,初始组装和随后的更新版本在 15 年来一直是犬类遗传学的主要资源。虽然最初的组装产生了一个高质量的草图,但与当时生成的所有组装一样,它包含缺口、组装错误和缺失的序列,特别是在富含 GC 的区域,这些区域存在于许多启动子和蛋白质编码基因的第一个外显子中。在这里,我们展示了经过改进的、染色体水平的、高度连续的犬基因组组装,该组装基于长读长技术,序列连续性提高了 100 多倍,封闭了 CanFam3.1 参考组装中的 >23000 个缺口,并通过鉴定 >1200 个新的蛋白质编码转录本改善了基因注释。该组装和注释可在 NCBI 上以 GCF_000002285.5 的形式获得。