Genetic Analysis Department, Laboratory of Racing Chemistry, 1731-2 Tsurutamachi, Utsunomiya 320-0851, Tochigi, Japan.
Equine Research Institute, Japan Racing Association, 1400-4 Shiba, Shimotsuke 329-0412, Tochigi, Japan.
Genes (Basel). 2023 Mar 3;14(3):638. doi: 10.3390/genes14030638.
Thoroughbreds are some of the most famous racehorses worldwide and are currently animals of high economic value. To understand genomic variability in Thoroughbreds, we identified genome-wide insertions and deletions (INDELs) and obtained their allele frequencies in this study. INDELs were obtained from whole-genome sequencing data of 101 Thoroughbred racehorses by mapping sequence reads to the horse reference genome. By integrating individual data, 1,453,349 and 113,047 INDELs were identified in the autosomal (1-31) and X chromosomes, respectively, while 18 INDELs were identified on the mitochondrial genome, totaling 1,566,414 INDELs. Of those, 779,457 loci (49.8%) were novel INDELs, while 786,957 loci (50.2%) were already registered in Ensembl. The sizes of diallelic INDELs ranged from -286 to +476, and the majority, 717,736 (52.14%) and 220,672 (16.03%), were 1-bp and 2-bp variants, respectively. Numerous INDELs were found to have lower frequencies of alternative (Alt) alleles. Many rare variants with low Alt allele frequencies (<0.5%) were also detected. In addition, 5955 loci were genotyped as having a minor allele frequency of 0.5 and being heterogeneous genotypes in all the horses. While short-read sequencing and its mapping to reference genome is a simple way of detecting variants, fake variants may be detected. Therefore, our data help to identify true variants in Thoroughbred horses. The INDEL database we constructed will provide useful information for genetic studies and industrial applications in Thoroughbred horses, including a gene-editing test for gene-doping control and a parentage test using INDELs for horse registration and identification.
纯血马是世界上最著名的赛马之一,目前具有很高的经济价值。为了了解纯血马的基因组变异性,我们在本研究中鉴定了全基因组插入和缺失(INDELs),并获得了它们的等位基因频率。通过将测序reads映射到马参考基因组,从 101 匹纯血赛马的全基因组测序数据中获得了 INDELs。通过整合个体数据,在常染色体(1-31)和 X 染色体上分别鉴定出 1453349 个和 113047 个 INDELs,而在线粒体基因组上鉴定出 18 个 INDELs,总计 1566414 个 INDELs。其中,779457 个(49.8%)是新的 INDELs,而 786957 个(50.2%)已经在 Ensembl 中注册。双等位基因 INDELs 的大小范围从-286 到+476,大多数(717736,52.14%)和 220672(16.03%)分别为 1-bp 和 2-bp 变体。许多 INDELs 的替代(Alt)等位基因频率较低。还检测到许多具有低 Alt 等位基因频率(<0.5%)的稀有变体。此外,在所有马匹中,有 5955 个位点被基因分型为等位基因频率为 0.5,且为异质基因型。虽然短读测序及其对参考基因组的映射是检测变体的一种简单方法,但可能会检测到假变体。因此,我们的数据有助于识别纯血马中的真实变体。我们构建的 INDEL 数据库将为纯血马的遗传研究和工业应用提供有用的信息,包括用于基因兴奋剂控制的基因编辑测试和使用 INDEL 进行马匹注册和识别的亲子测试。