Science for Life Laboratory, Department of Medical Biochemistry and Microbiology, Uppsala University, 75132, Uppsala, Sweden.
Department of Human Genetics, University of Michigan, Ann Arbor, MI, 48107, USA.
Genome Biol. 2023 Aug 15;24(1):187. doi: 10.1186/s13059-023-03023-7.
The international Dog10K project aims to sequence and analyze several thousand canine genomes. Incorporating 20 × data from 1987 individuals, including 1611 dogs (321 breeds), 309 village dogs, 63 wolves, and four coyotes, we identify genomic variation across the canid family, setting the stage for detailed studies of domestication, behavior, morphology, disease susceptibility, and genome architecture and function.
We report the analysis of > 48 M single-nucleotide, indel, and structural variants spanning the autosomes, X chromosome, and mitochondria. We discover more than 75% of variation for 239 sampled breeds. Allele sharing analysis indicates that 94.9% of breeds form monophyletic clusters and 25 major clades. German Shepherd Dogs and related breeds show the highest allele sharing with independent breeds from multiple clades. On average, each breed dog differs from the UU_Cfam_GSD_1.0 reference at 26,960 deletions and 14,034 insertions greater than 50 bp, with wolves having 14% more variants. Discovered variants include retrogene insertions from 926 parent genes. To aid functional prioritization, single-nucleotide variants were annotated with SnpEff and Zoonomia phyloP constraint scores. Constrained positions were negatively correlated with allele frequency. Finally, the utility of the Dog10K data as an imputation reference panel is assessed, generating high-confidence calls across varied genotyping platform densities including for breeds not included in the Dog10K collection.
We have developed a dense dataset of 1987 sequenced canids that reveals patterns of allele sharing, identifies likely functional variants, informs breed structure, and enables accurate imputation. Dog10K data are publicly available.
国际 Dog10K 项目旨在对数千个犬基因组进行测序和分析。该研究整合了来自 1987 个个体的 20×数据,包括 1611 只犬(321 个品种)、309 只乡村犬、63 只狼和 4 只郊狼,确定了犬科动物家族的基因组变异,为详细研究驯化、行为、形态、疾病易感性以及基因组结构和功能奠定了基础。
我们报告了对覆盖常染色体、X 染色体和线粒体的>48M 个单核苷酸、插入缺失和结构变异的分析。我们发现 239 个抽样品种中超过 75%的变异。等位基因共享分析表明,94.9%的品种形成单系聚类,25 个主要分支。德国牧羊犬及其相关品种与来自多个分支的独立品种共享的等位基因最多。平均而言,每个品种的狗与 UU_Cfam_GSD_1.0 参考基因组的差异在于 26960 个缺失和 14034 个大于 50bp 的插入,而狼的变异更多,有 14%。发现的变异包括 926 个亲本基因的反转录基因插入。为了帮助功能优先级排序,单核苷酸变异被 SnpEff 和 Zoonomia phyloP 约束评分注释。受约束的位置与等位基因频率呈负相关。最后,评估了 Dog10K 数据作为一个 imputation 参考面板的效用,在不同的基因分型平台密度下生成高可信度的调用,包括不在 Dog10K 集合中的品种。
我们开发了一个由 1987 个测序犬组成的密集数据集,揭示了等位基因共享的模式,确定了可能的功能变异,为品种结构提供了信息,并实现了准确的 imputation。Dog10K 数据可供公开获取。