Bailey Ernie, Finno Carrie J, Cullen Jonah N, Kalbfleisch Ted, Petersen Jessica L
University of Kentucky, Maxwell H. Gluck Equine Research Center, Lexington, KY, 40546, USA.
University of California-Davis, Population Health and Reproduction, Davis, CA, 95616, USA.
Sci Rep. 2024 Oct 2;14(1):22930. doi: 10.1038/s41598-024-73645-9.
Whole genome sequences (WGS) of 185 North American Thoroughbred horses were compared to quantify the number and frequency of variants, diversity of mitotypes, and autosomal runs of homozygosity (ROH). Of the samples, 82 horses were born between 1965 and 1986 (Group 1); the remaining 103, selected to maximize pedigree diversity, were born between 2000 and 2020 (Group 2). Over 14.3 million autosomal variants were identified with 4.5-5.0 million found per horse. Mitochondrial sequences associated the North American Thoroughbreds with 9 of 17 clades previously identified among diverse breeds. Individual coefficients of inbreeding, estimated from ROH, averaged 0.266 (Group 1) and 0.283 (Group 2). When SNP arrays were simulated using subsets of WGS markers, the arrays over-estimated lengths of ROH. WGS-based estimates of inbreeding were highly correlated (r > 0.98) with SNP array-based estimates, but only moderately correlated (r = 0.40) with inbreeding based on 5-generation pedigrees. On average, Group 1 horses had more heterozygous variants (P < 0.001), more total variants (P < 0.001), and lower individual inbreeding (F; P < 0.001) than horses in Group 2. However, the distribution of numbers of variants, allele frequency, and extent of ROH overlapped among all horses such that it was not possible to identify the group of origin of any single horse using these measures. Consequently, the Thoroughbred population would be better monitored by investigating changes in specific variants, rather than relying on broad measures of diversity. The WGS for these 185 horses is publicly available for comparison to other populations and as a foundation for modeling changes in population structure, breeding practices, or the appearance of deleterious variants.
对185匹北美纯种马的全基因组序列(WGS)进行比较,以量化变异的数量和频率、线粒体类型的多样性以及常染色体纯合子片段(ROH)。在这些样本中,82匹马出生于1965年至1986年(第1组);为使系谱多样性最大化而选择的其余103匹马出生于2000年至2020年(第2组)。共鉴定出超过1430万个常染色体变异,每匹马发现450万至500万个变异。线粒体序列将北美纯种马与先前在不同品种中鉴定出的17个分支中的9个联系起来。根据ROH估计的个体近亲繁殖系数,第1组平均为0.266,第2组平均为0.283。当使用WGS标记子集模拟SNP阵列时,阵列高估了ROH的长度。基于WGS的近亲繁殖估计值与基于SNP阵列的估计值高度相关(r>0.98),但与基于五代系谱的近亲繁殖仅中度相关(r = 0.40)。平均而言,第1组马比第2组马具有更多的杂合变异(P<0.001)、更多的总变异(P<0.001)和更低的个体近亲繁殖率(F;P<0.001)。然而,所有马匹的变异数量、等位基因频率和ROH范围的分布相互重叠,因此无法使用这些指标确定任何一匹马的起源组。因此,通过研究特定变异的变化,而不是依赖广泛的多样性指标,对纯种马群体进行更好的监测。这185匹马的WGS已公开提供,可用于与其他群体进行比较,并作为模拟群体结构变化、育种实践或有害变异出现的基础。