Department of Human Genetics, University of Michigan, Ann Arbor, MI, USA.
Department of Computational Medicine and Bioinformatics, University of Michigan, Ann Arbor, MI, USA.
Mamm Genome. 2024 Dec;35(4):565-576. doi: 10.1007/s00335-024-10056-1. Epub 2024 Aug 1.
For over 15 years, canine genetics research relied on a reference assembly from a Boxer breed dog named Tasha (i.e., canFam3.1). Recent advances in long-read sequencing and genome assembly have led to the development of numerous high-quality assemblies from diverse canines. These assemblies represent notable improvements in completeness, contiguity, and the representation of gene promoters and gene models. Although genome graph and pan-genome approaches have promise, most genetic analyses in canines rely upon the mapping of Illumina sequencing reads to a single reference. The Dog10K consortium, and others, have generated deep catalogs of genetic variation through an alignment of Illumina sequencing reads to a reference genome obtained from a German Shepherd Dog named Mischka (i.e., canFam4, UU_Cfam_GSD_1.0). However, alignment to a breed-derived genome may introduce bias in genotype calling across samples. Since the use of an outgroup reference genome may remove this effect, we have reprocessed 1929 samples analyzed by the Dog10K consortium using a Greenland wolf (mCanLor1.2) as the reference. We efficiently performed remapping and variant calling using a GPU-implementation of common analysis tools. The resulting call set removes the variability in genetic differences seen across samples and breed relationships revealed by principal component analysis are not affected by the choice of reference genome. Using this sequence data, we inferred the history of population sizes and found that village dog populations experienced a 9-13 fold reduction in historic effective population size relative to wolves.
15 年来,犬类遗传学研究依赖于一只名为 Tasha(即 canFam3.1)的拳师犬的参考基因组。最近,长读测序和基因组组装技术的进步,推动了来自不同犬种的大量高质量基因组组装的发展。这些组装在完整性、连续性以及基因启动子和基因模型的表示方面都有显著的提高。尽管基因组图谱和泛基因组方法具有很大的潜力,但犬类的大多数遗传分析仍然依赖于将 Illumina 测序reads 映射到单个参考基因组上。Dog10K 联盟和其他组织通过将 Illumina 测序reads 与一只名为 Mischka(即 canFam4、UU_Cfam_GSD_1.0)的德国牧羊犬的参考基因组对齐,生成了大量遗传变异的目录。然而,与品种衍生的基因组对齐可能会导致在跨样本的基因型调用中引入偏差。由于使用外群参考基因组可能会消除这种影响,我们使用一只格陵兰狼(mCanLor1.2)作为参考,重新处理了 Dog10K 联盟分析的 1929 个样本。我们使用常见分析工具的 GPU 实现,高效地执行了重新映射和变异调用。所得的调用集消除了样本间遗传差异的可变性,主成分分析所揭示的品种关系不受参考基因组选择的影响。使用这些序列数据,我们推断了种群规模的历史,并发现与狼相比,村庄犬种群的历史有效种群数量减少了 9-13 倍。