Durward-Akhurst S A, Schaefer R J, Grantham B, Carey W K, Mickelson J R, McCue M E
Department of Veterinary Population Medicine, University of Minnesota, Minneapolis, MN, United States.
Interval Bio LLC, Mountain View, CA, United States.
Front Genet. 2021 Dec 2;12:758366. doi: 10.3389/fgene.2021.758366. eCollection 2021.
Genetic variation is a key contributor to health and disease. Understanding the link between an individual's genotype and the corresponding phenotype is a major goal of medical genetics. Whole genome sequencing (WGS) within and across populations enables highly efficient variant discovery and elucidation of the molecular nature of virtually all genetic variation. Here, we report the largest catalog of genetic variation for the horse, a species of importance as a model for human athletic and performance related traits, using WGS of 534 horses. We show the extent of agreement between two commonly used variant callers. In data from ten target breeds that represent major breed clusters in the domestic horse, we demonstrate the distribution of variants, their allele frequencies across breeds, and identify variants that are unique to a single breed. We investigate variants with no homozygotes that may be potential embryonic lethal variants, as well as variants present in all individuals that likely represent regions of the genome with errors, poor annotation or where the reference genome carries a variant. Finally, we show regions of the genome that have higher or lower levels of genetic variation compared to the genome average. This catalog can be used for variant prioritization for important equine diseases and traits, and to provide key information about regions of the genome where the assembly and/or annotation need to be improved.
基因变异是健康与疾病的关键因素。理解个体基因型与相应表型之间的联系是医学遗传学的主要目标。群体内部和群体之间的全基因组测序(WGS)能够高效地发现变异,并阐明几乎所有基因变异的分子本质。在此,我们报告了马匹的最大基因变异目录,马作为人类运动和性能相关性状的模型具有重要意义,该目录基于对534匹马进行的全基因组测序。我们展示了两种常用变异检测工具之间的一致性程度。在代表家马主要品种集群的十个目标品种的数据中,我们展示了变异的分布、它们在各品种中的等位基因频率,并鉴定出单个品种特有的变异。我们研究了没有纯合子的变异,这些变异可能是潜在的胚胎致死变异,以及所有个体中都存在的变异,这些变异可能代表基因组中存在错误、注释不佳或参考基因组携带变异的区域。最后,我们展示了与基因组平均水平相比,基因组中基因变异水平较高或较低的区域。该目录可用于对重要马病和性状的变异进行优先级排序,并提供有关基因组中需要改进组装和/或注释区域的关键信息。