Agriculture Victoria, AgriBio, Centre for AgriBioscience, Bundoora, VIC, 3083, Australia.
Agriculture and Food, CSIRO, St Lucia, QLD, 4067, Australia.
Genet Sel Evol. 2024 Sep 12;56(1):62. doi: 10.1186/s12711-024-00931-5.
Mitochondrial genomes differ from the nuclear genome and in humans it is known that mitochondrial variants contribute to genetic disorders. Prior to genomics, some livestock studies assessed the role of the mitochondrial genome but these were limited and inconclusive. Modern genome sequencing provides an opportunity to re-evaluate the potential impact of mitochondrial variation on livestock traits. This study first evaluated the empirical accuracy of mitochondrial sequence imputation and then used real and imputed mitochondrial sequence genotypes to study the role of mitochondrial variants on milk production traits of dairy cattle.
The empirical accuracy of imputation from Single Nucleotide Polymorphism (SNP) panels to mitochondrial sequence genotypes was assessed in 516 test animals of Holstein, Jersey and Red breeds using Beagle software and a sequence reference of 1883 animals. The overall accuracy estimated as the Pearson's correlation squared (R) between all imputed and real genotypes across all animals was 0.454. The low accuracy was attributed partly to the majority of variants having low minor allele frequency (MAF < 0.005) but also due to variants in the hypervariable D-loop region showing poor imputation accuracy. Beagle software provides an internal estimate of imputation accuracy (DR2), and 10 percent of the total 1927 imputed positions showed DR2 greater than 0.9 (N = 201). There were 151 sites with empirical R > 0.9 (of 954 variants segregating in the test animals) and 138 of these overlapped the sites with DR2 > 0.9. This suggests that the DR2 statistic is a reasonable proxy to select sites that are imputed with higher accuracy for downstream analyses. Accordingly, in the second part of the study mitochondrial sequence variants were imputed from real mitochondrial SNP panel genotypes of 9515 Australian Holstein, Jersey and Red dairy cattle. Then, using only sites with DR2 > 0.900 and real genotypes, we undertook a genome-wide association study (GWAS) for milk, fat and protein yields. The GWAS mitochondrial SNP effects were not significant.
The accuracy of imputation of mitochondrial genotypes from the SNP panel to sequence was generally low. The Beagle DR2 statistic enabled selection of sites imputed with higher empirical accuracy. We recommend building larger reference populations with mitochondrial sequence to improve the accuracy of imputing less common variants and ensuring that SNP panels include common variants in the D-loop region.
线粒体基因组与核基因组不同,已知线粒体变体可导致遗传疾病。在基因组学出现之前,一些家畜研究评估了线粒体基因组的作用,但这些研究有限且没有定论。现代基因组测序为重新评估线粒体变异对家畜性状的潜在影响提供了机会。本研究首先评估了从单核苷酸多态性 (SNP) 面板到线粒体序列基因型的经验性内插准确性,然后使用真实和内插的线粒体序列基因型来研究线粒体变体对奶牛产奶性状的作用。
使用 Beagle 软件和 1883 个动物的序列参考,在荷斯坦、泽西和红奶牛的 516 个测试动物中评估了从 SNP 面板到线粒体序列基因型的经验内插准确性。根据所有动物的所有内插和真实基因型之间的 Pearson 相关平方 (R) 估计的整体准确性为 0.454。低准确性部分归因于大多数变体的次要等位基因频率 (MAF < 0.005) 较低,但也归因于高度可变 D-环区域的变体显示出较差的内插准确性。Beagle 软件提供了内插准确性的内部估计 (DR2),1927 个总内插位置的 10%显示 DR2 大于 0.9(N = 201)。有 151 个位置的经验 R > 0.9(在测试动物中分离的 954 个变体),其中 138 个与 DR2 > 0.9 的位置重叠。这表明 DR2 统计量是选择具有更高准确性的用于下游分析的内插位点的合理代理。因此,在研究的第二部分,从 9515 头澳大利亚荷斯坦、泽西和红奶牛的真实线粒体 SNP 面板基因型中内插了线粒体序列变体。然后,仅使用 DR2 > 0.900 和真实基因型的位点,我们对牛奶、脂肪和蛋白质产量进行了全基因组关联研究 (GWAS)。GWAS 线粒体 SNP 效应不显著。
从 SNP 面板到序列的线粒体基因型内插准确性普遍较低。Beagle DR2 统计量可用于选择具有更高经验准确性的内插位点。我们建议构建具有更大线粒体序列的参考群体,以提高对罕见变体的内插准确性,并确保 SNP 面板包含 D-环区域的常见变体。