Shandong Provincial Key Laboratory for Livestock Germplasm Innovation & Utilization, College of Animal Science, Shandong Agricultural University, Tai'an, 271018, China.
CAS Key Laboratory of Adaptation and Evolution of Plateau Biota, Qinghai Key Laboratory of Animal Ecological Genomics, Northwest Institute of Plateau Biology, Chinese Academy of Sciences, Xining, 810001, Qinghai, China.
Genet Sel Evol. 2024 Sep 3;56(1):60. doi: 10.1186/s12711-024-00927-1.
Accurate breed identification is essential for the conservation and sustainable use of indigenous farm animal genetic resources. In this study, we evaluated the phylogenetic relationships and genomic breed compositions of 13 sheep breeds using SNP and InDel data from whole genome sequencing. The breeds included 11 Chinese indigenous and 2 foreign commercial breeds. We compared different strategies for breed identification with respect to different marker types, i.e. SNPs, InDels, and a combination of SNPs and InDels (named SIs), different breed-informative marker detection methods, and different machine learning classification methods.
Using WGS-based SNPs and InDels, we revealed the phylogenetic relationships between 11 Chinese indigenous and two foreign sheep breeds and quantified their purities through estimated genomic breed compositions. We found that the optimal strategy for identifying these breeds was the combination of DFI_union for breed-informative marker detection, which integrated the methods of Delta, Pairwise Wright's FST, and Informativeness for Assignment (namely DFI) by merging the breed-informative markers derived from the three methods, and KSR for breed assignment, which integrated the methods of K-Nearest Neighbor, Support Vector Machine, and Random Forest (namely KSR) by intersecting their results. Using SI markers improved the identification accuracy compared to using SNPs or InDels alone. We achieved accuracies over 97.5% when using at least the 1000 most breed-informative (MBI) SI markers and even 100% when using 5000 SI markers.
Our results provide not only an important foundation for conservation of these Chinese local sheep breeds, but also general approaches for breed identification of indigenous farm animal breeds.
准确的品种鉴定对于保护和可持续利用本土农场动物遗传资源至关重要。在这项研究中,我们使用全基因组测序的 SNP 和 InDel 数据评估了 13 个绵羊品种的系统发育关系和基因组品种组成。这些品种包括 11 个中国本土品种和 2 个国外商业品种。我们比较了不同的标记类型(SNP、InDel 和 SNP 和 InDel 的组合,命名为 SIs)、不同的品种信息标记检测方法和不同的机器学习分类方法在品种鉴定方面的不同策略。
使用基于 WGS 的 SNP 和 InDel,我们揭示了 11 个中国本土品种和两个国外绵羊品种之间的系统发育关系,并通过估计的基因组品种组成量化了它们的纯度。我们发现,识别这些品种的最佳策略是 DFI_union 与 KSR 相结合的策略,用于品种信息标记检测的 DFI_union 集成了 Delta、Pairwise Wright's FST 和 Informativeness for Assignment(即 DFI)三种方法的品种信息标记,用于品种分配的 KSR 则集成了 K-Nearest Neighbor、Support Vector Machine 和 Random Forest(即 KSR)三种方法的结果。与单独使用 SNP 或 InDel 相比,使用 SI 标记可提高识别准确性。当使用至少 1000 个最具品种信息性(MBI)SI 标记时,我们实现了超过 97.5%的准确率,甚至使用 5000 个 SI 标记时也可实现 100%的准确率。
我们的研究结果不仅为保护这些中国本土绵羊品种提供了重要基础,也为本土农场动物品种的品种鉴定提供了一般方法。