利用品种信息性单核苷酸多态性（SNP）以及基于全基因组序列数据和SNP芯片数据的机器学习进行品种鉴定。

Breed identification using breed-informative SNPs and machine learning based on whole genome sequence data and SNP chip data.

作者信息

Zhao Changheng, Wang Dan, Teng Jun, Yang Cheng, Zhang Xinyi, Wei Xianming, Zhang Qin

机构信息

Shandong Provincial Key Laboratory of Animal Biotechnology and Disease Control and Prevention, College of Animal Science and Veterinary Medicine, Shandong Agricultural University, Tai'an, 271018, China.

出版信息

J Anim Sci Biotechnol. 2023 Jun 1;14(1):85. doi: 10.1186/s40104-023-00880-x.

DOI:10.1186/s40104-023-00880-x

PMID:37259083

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10234014/

Abstract

BACKGROUND

Breed identification is useful in a variety of biological contexts. Breed identification usually involves two stages, i.e., detection of breed-informative SNPs and breed assignment. For both stages, there are several methods proposed. However, what is the optimal combination of these methods remain unclear. In this study, using the whole genome sequence data available for 13 cattle breeds from Run 8 of the 1,000 Bull Genomes Project, we compared the combinations of three methods (Delta, F, and I) for breed-informative SNP detection and five machine learning methods (KNN, SVM, RF, NB, and ANN) for breed assignment with respect to different reference population sizes and difference numbers of most breed-informative SNPs. In addition, we evaluated the accuracy of breed identification using SNP chip data of different densities.

RESULTS

We found that all combinations performed quite well with identification accuracies over 95% in all scenarios. However, there was no combination which performed the best and robust across all scenarios. We proposed to integrate the three breed-informative detection methods, named DFI, and integrate the three machine learning methods, KNN, SVM, and RF, named KSR. We found that the combination of these two integrated methods outperformed the other combinations with accuracies over 99% in most cases and was very robust in all scenarios. The accuracies from using SNP chip data were only slightly lower than that from using sequence data in most cases.

CONCLUSIONS

The current study showed that the combination of DFI and KSR was the optimal strategy. Using sequence data resulted in higher accuracies than using chip data in most cases. However, the differences were generally small. In view of the cost of genotyping, using chip data is also a good option for breed identification.

摘要

背景

品种鉴定在多种生物学背景下都很有用。品种鉴定通常涉及两个阶段，即品种信息性单核苷酸多态性（SNP）的检测和品种归属。对于这两个阶段，都提出了几种方法。然而，这些方法的最佳组合仍不清楚。在本研究中，我们使用来自1000头公牛基因组计划第8轮的13个牛品种的全基因组序列数据，针对不同的参考群体大小和最多品种信息性SNP的差异数量，比较了三种品种信息性SNP检测方法（Delta、F和I）与五种品种归属机器学习方法（KNN、SVM、RF、NB和ANN）的组合。此外，我们使用不同密度的SNP芯片数据评估了品种鉴定的准确性。

结果

我们发现所有组合在所有情况下的鉴定准确率都超过95%，表现都相当不错。然而，没有一种组合在所有情况下都是表现最佳且稳健的。我们建议整合三种品种信息性检测方法，命名为DFI，并整合三种机器学习方法KNN、SVM和RF，命名为KSR。我们发现这两种整合方法的组合在大多数情况下的准确率超过99%，优于其他组合，并且在所有情况下都非常稳健。在大多数情况下，使用SNP芯片数据的准确率仅略低于使用序列数据的准确率。

结论

当前研究表明，DFI和KSR的组合是最佳策略。在大多数情况下，使用序列数据比使用芯片数据的准确率更高。然而，差异通常较小。鉴于基因分型的成本，使用芯片数据也是品种鉴定的一个不错选择。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/10c5/10234014/27723ef22ac2/40104_2023_880_Fig1_HTML.jpg

相似文献

Breed identification using breed-informative SNPs and machine learning based on whole genome sequence data and SNP chip data.利用品种信息性单核苷酸多态性（SNP）以及基于全基因组序列数据和SNP芯片数据的机器学习进行品种鉴定。

J Anim Sci Biotechnol. 2023 Jun 1;14(1):85. doi: 10.1186/s40104-023-00880-x.

Population structure and breed identification of Chinese indigenous sheep breeds using whole genome SNPs and InDels.利用全基因组 SNPs 和 InDels 对中国本土绵羊品种进行群体结构和品种鉴定。

Genet Sel Evol. 2024 Sep 3;56(1):60. doi: 10.1186/s12711-024-00927-1.

The use of a genomic relationship matrix for breed assignment of cattle breeds: comparison and combination with a machine learning method.利用基因组关系矩阵对牛品种进行品种归属：与机器学习方法的比较和结合。

J Anim Sci. 2023 Jan 3;101. doi: 10.1093/jas/skad172.

A machine learning approach for the identification of population-informative markers from high-throughput genotyping data: application to several pig breeds.一种从高通量基因分型数据中识别群体信息标记的机器学习方法：在多个猪品种中的应用。

Animal. 2020 Feb;14(2):223-232. doi: 10.1017/S1751731119002167. Epub 2019 Oct 11.

Combined use of principal component analysis and random forests identify population-informative single nucleotide polymorphisms: application in cattle breeds.主成分分析与随机森林的联合使用可识别群体信息单核苷酸多态性：在牛品种中的应用

J Anim Breed Genet. 2015 Oct;132(5):346-56. doi: 10.1111/jbg.12155. Epub 2015 Mar 17.

Comparison of three statistical approaches for feature selection for fine-scale genetic population assignment in four pig breeds.四种猪品种中用于精细尺度遗传群体分配的特征选择的三种统计方法的比较。

Trop Anim Health Prod. 2021 Jul 10;53(3):395. doi: 10.1007/s11250-021-02824-x.

Preselection statistics and Random Forest classification identify population informative single nucleotide polymorphisms in cosmopolitan and autochthonous cattle breeds.预选统计和随机森林分类确定了世界性和土着牛品种中具有群体信息的单核苷酸多态性。

Animal. 2018 Jan;12(1):12-19. doi: 10.1017/S1751731117001355. Epub 2017 Jun 23.

Design of a low-density SNP chip for the main Australian sheep breeds and its effect on imputation and genomic prediction accuracy.用于澳大利亚主要绵羊品种的低密度单核苷酸多态性（SNP）芯片设计及其对填充和基因组预测准确性的影响。

Anim Genet. 2015 Oct;46(5):544-56. doi: 10.1111/age.12340. Epub 2015 Sep 11.

Development of a genomic tool for breed assignment by comparison of different classification models: Application to three local cattle breeds.开发一种基因组工具，通过比较不同的分类模型进行品种归属：在三个本地牛品种中的应用。

J Anim Breed Genet. 2022 Jan;139(1):40-61. doi: 10.1111/jbg.12643. Epub 2021 Aug 24.

Classification accuracy of machine learning algorithms for Chinese local cattle breeds using genomic markers.机器学习算法利用基因组标记对中国本地牛品种进行分类准确性研究。

Yi Chuan. 2024 Jul;46(7):530-539. doi: 10.16288/j.yczz.24-059.

引用本文的文献

A deep learning strategy for accurate identification of purebred and hybrid pigs across SNP chips.一种基于SNP芯片准确识别纯种猪和杂交猪的深度学习策略。

J Anim Sci Biotechnol. 2025 Aug 14;16(1):116. doi: 10.1186/s40104-025-01249-y.

A high-throughput screening method for selecting feature SNPs to evaluate breed diversity and infer ancestry.一种用于选择特征单核苷酸多态性以评估品种多样性和推断祖先的高通量筛选方法。

Genome Res. 2025 Aug 1;35(8):1875-1886. doi: 10.1101/gr.280176.124.

Comprehensive duck DNA fingerprinting based on machine learning for breed identification.基于机器学习的综合鸭DNA指纹识别用于品种鉴定。

Poult Sci. 2025 May 29;104(8):105359. doi: 10.1016/j.psj.2025.105359.

rPIMS: a ShinyR package for the precision identification and modelling of livestock breeds using genomic data and machine learning approaches.rPIMS：一个用于利用基因组数据和机器学习方法对家畜品种进行精准识别和建模的ShinyR软件包。

Bioinform Adv. 2025 Apr 7;5(1):vbaf077. doi: 10.1093/bioadv/vbaf077. eCollection 2025.

Design and verification of a 25 K multiple-SNP liquid-capture chip by target sequencing for dairy goat.基于靶向测序的奶山羊25K多位点单核苷酸多态性液体捕获芯片的设计与验证

BMC Genomics. 2025 Apr 15;26(1):377. doi: 10.1186/s12864-025-11576-z.

Identification of Taihang-chicken-specific genetic markers using genome-wide SNPs and machine learning: BREED-SPECIFIC SNPS OF TAIHANG CHICKEN.利用全基因组单核苷酸多态性和机器学习鉴定太行鸡特异性遗传标记：太行鸡的品种特异性单核苷酸多态性

Poult Sci. 2025 Jan;104(1):104585. doi: 10.1016/j.psj.2024.104585. Epub 2024 Nov 22.

Identifying low-density, ancestry-informative SNP markers through whole genome resequencing in Indian, Chinese, and wild yak.通过对印度、中国和野牦牛的全基因组重测序，鉴定出低密度、具有祖先信息的 SNP 标记。

BMC Genomics. 2024 Nov 5;25(1):1043. doi: 10.1186/s12864-024-10924-9.

Genet Sel Evol. 2024 Sep 3;56(1):60. doi: 10.1186/s12711-024-00927-1.

An overview of recent technological developments in bovine genomics.牛基因组学近期技术发展概述。

Vet Anim Sci. 2024 Jul 23;25:100382. doi: 10.1016/j.vas.2024.100382. eCollection 2024 Sep.

本文引用的文献

J Anim Breed Genet. 2022 Jan;139(1):40-61. doi: 10.1111/jbg.12643. Epub 2021 Aug 24.

Selection for environmental variance of litter size in rabbits involves genes in pathways controlling animal resilience.选择环境方差的窝仔数在兔子涉及基因在途径控制动物的恢复力。

Genet Sel Evol. 2021 Jul 13;53(1):59. doi: 10.1186/s12711-021-00653-y.

Trop Anim Health Prod. 2021 Jul 10;53(3):395. doi: 10.1007/s11250-021-02824-x.

A low-density SNP genotyping panel for the accurate prediction of cattle breeds.一种用于准确预测牛种的低密度 SNP 基因分型面板。

J Anim Sci. 2020 Nov 1;98(11). doi: 10.1093/jas/skaa337.

Whole-genome sequencing of European autochthonous and commercial pig breeds allows the detection of signatures of selection for adaptation of genetic resources to different breeding and production systems.对欧洲本土和商业猪品种进行全基因组测序，可以检测到遗传资源适应不同养殖和生产系统的选择特征。

Genet Sel Evol. 2020 Jun 26;52(1):33. doi: 10.1186/s12711-020-00553-7.

Discovery of significant porcine SNPs for swine breed identification by a hybrid of information gain, genetic algorithm, and frequency feature selection technique.利用信息增益、遗传算法和频率特征选择技术的混合方法发现用于猪种鉴定的显著猪 SNP。

BMC Bioinformatics. 2020 May 26;21(1):216. doi: 10.1186/s12859-020-3471-4.

De novo assembly of the cattle reference genome with single-molecule sequencing.利用单分子测序技术从头组装牛参考基因组。

Gigascience. 2020 Mar 1;9(3). doi: 10.1093/gigascience/giaa021.

Animal. 2020 Feb;14(2):223-232. doi: 10.1017/S1751731119002167. Epub 2019 Oct 11.

Comparative analysis of five different methods to design a breed-specific SNP panel for cattle.五种不同方法设计牛种特异性 SNP 面板的比较分析。

Anim Biotechnol. 2021 Feb;32(1):130-136. doi: 10.1080/10495398.2019.1646266. Epub 2019 Jul 31.

1000 Bull Genomes Project to Map Simple and Complex Genetic Traits in Cattle: Applications and Outcomes.“1000 头公牛基因组计划”旨在对牛的简单和复杂遗传特征进行定位：应用与成果。

Annu Rev Anim Biosci. 2019 Feb 15;7:89-102. doi: 10.1146/annurev-animal-020518-115024. Epub 2019 Dec 3.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

利用品种信息性单核苷酸多态性（SNP）以及基于全基因组序列数据和SNP芯片数据的机器学习进行品种鉴定。

Breed identification using breed-informative SNPs and machine learning based on whole genome sequence data and SNP chip data.

作者信息

机构信息

出版信息

BACKGROUND

RESULTS

CONCLUSIONS

背景

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献