• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

错误标注的表型状态对从奶牛单核苷酸多态性(SNP)基因型中识别突变携带者的影响。

The effect of mislabeled phenotypic status on the identification of mutation-carriers from SNP genotypes in dairy cattle.

作者信息

Biffani Stefano, Pausch Hubert, Schwarzenbacher Hermann, Biscarini Filippo

机构信息

IBBA-CNR, Via Einstein-Loc. Cascina Codazza, 26900, Lodi, Italy.

AIA: Associazione Italiana Allevatori, Via Giuseppe Tomassetti 9, 00161, Rome, Italy.

出版信息

BMC Res Notes. 2017 Jun 26;10(1):230. doi: 10.1186/s13104-017-2540-x.

DOI:10.1186/s13104-017-2540-x
PMID:28651561
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC5485573/
Abstract

BACKGROUND

Statistical and machine learning applications are increasingly popular in animal breeding and genetics, especially to compute genomic predictions for phenotypes of interest. Noise (errors) in the data may have a negative impact on the accuracy of predictions. The effects of noisy data have been investigated in genome-wide association studies for case-control experiments, and in genomic predictions for binary traits in plants. No studies have been published yet on the impact of noisy data in animal genomics. In this work, the susceptibility to noise of five classification models (Lasso-penalised logistic regression-Lasso, K-nearest neighbours-KNN, random forest-RF, support vector machines with linear-SVML-or radial-SVMR-kernel) was tested. As illustration, the identification of carriers of a recessive mutation in cattle (Bos taurus) was used. A population of 3116 Fleckvieh animals with SNP genotypes on the same chromosome as the mutation locus (BTA 19) was available. The carrier status (0/1 phenotype) was randomly sampled to generate noise. Increasing proportions of noise-up to 20%- were introduced in the data.

RESULTS

SVMR and Lasso were relatively more robust to noise in the data, with total accuracy still above 0.975 and TPR (true positive rate; accuracy in the minority class) in the range 0.5-0.80 also with 17.5-20% mislabeled observations. The performance of SVML and RF decreased monotonically with increasing noise in the data, while KNN constantly failed to identify mutation carriers (observations in the minority class). The computation time increased with noise in the data, especially for the two support vector machines classifiers.

CONCLUSIONS

This work was the first to assess the impact of phenotyping errors on the accuracy of genomic predictions in animal genetics. The choice of the classification method can influence results in terms of higher or lower susceptibility to noise. In the presented problem, SVM with radial kernel performed relatively well even when the proportion of errors in the data reached 12.5%. Lasso was the second best method, while SVML, RF and KNN were very sensitive to noise. Taking into account both accuracy and computation time, Lasso provided the best combination.

摘要

背景

统计和机器学习应用在动物育种和遗传学中越来越受欢迎,特别是用于计算感兴趣表型的基因组预测。数据中的噪声(误差)可能会对预测的准确性产生负面影响。噪声数据的影响已在病例对照实验的全基因组关联研究以及植物二元性状的基因组预测中得到研究。尚未有关于噪声数据对动物基因组学影响的研究发表。在这项工作中,测试了五种分类模型(套索惩罚逻辑回归 - Lasso、K近邻 - KNN、随机森林 - RF、线性支持向量机 - SVML或径向支持向量机 - SVMR内核)对噪声的敏感性。作为示例,使用了牛(Bos taurus)中隐性突变携带者的鉴定。有一个由3116头弗莱维赫动物组成的群体,其单核苷酸多态性(SNP)基因型与突变位点在同一条染色体上(牛19号染色体,BTA 19)。随机抽取携带者状态(0/1表型)以产生噪声。在数据中引入了高达20%的噪声比例增加。

结果

SVMR和Lasso对数据中的噪声相对更具鲁棒性,即使在17.5% - 20%的观测值被错误标记的情况下,总准确率仍高于0.975,真阳性率(TPR;少数类中的准确率)在0.5 - 0.80范围内。SVML和RF的性能随着数据中噪声的增加而单调下降,而KNN始终无法识别突变携带者(少数类中的观测值)。计算时间随着数据中的噪声增加而增加,特别是对于两个支持向量机分类器。

结论

这项工作首次评估了表型错误对动物遗传学中基因组预测准确性的影响。分类方法的选择会在对噪声的敏感性高低方面影响结果。在提出的问题中,即使数据中的错误比例达到12.5%,具有径向内核的支持向量机表现也相对较好。Lasso是第二好的方法,而SVML、RF和KNN对噪声非常敏感。综合考虑准确性和计算时间,Lasso提供了最佳组合。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4036/5485573/216a2282103e/13104_2017_2540_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4036/5485573/81bcfe148b14/13104_2017_2540_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4036/5485573/d8d80d86244f/13104_2017_2540_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4036/5485573/13f02b88e171/13104_2017_2540_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4036/5485573/216a2282103e/13104_2017_2540_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4036/5485573/81bcfe148b14/13104_2017_2540_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4036/5485573/d8d80d86244f/13104_2017_2540_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4036/5485573/13f02b88e171/13104_2017_2540_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/4036/5485573/216a2282103e/13104_2017_2540_Fig4_HTML.jpg

相似文献

1
The effect of mislabeled phenotypic status on the identification of mutation-carriers from SNP genotypes in dairy cattle.错误标注的表型状态对从奶牛单核苷酸多态性(SNP)基因型中识别突变携带者的影响。
BMC Res Notes. 2017 Jun 26;10(1):230. doi: 10.1186/s13104-017-2540-x.
2
Use of SNP genotypes to identify carriers of harmful recessive mutations in cattle populations.利用单核苷酸多态性(SNP)基因型鉴定牛群中有害隐性突变的携带者。
BMC Genomics. 2016 Nov 3;17(1):857. doi: 10.1186/s12864-016-3218-9.
3
"Noisy beets": impact of phenotyping errors on genomic predictions for binary traits in Beta vulgaris.“嘈杂的甜菜”:表型错误对甜菜二元性状基因组预测的影响
Plant Methods. 2016 Jul 18;12:36. doi: 10.1186/s13007-016-0136-4. eCollection 2016.
4
Genomic predictions for economically important traits in Brazilian Braford and Hereford beef cattle using true and imputed genotypes.利用真实和推算基因型对巴西布拉福德和赫里福德肉牛的经济重要性状进行基因组预测。
BMC Genet. 2017 Jan 18;18(1):2. doi: 10.1186/s12863-017-0475-9.
5
Accuracy of direct genomic values derived from imputed single nucleotide polymorphism genotypes in Jersey cattle.从泽西牛的单核苷酸多态性基因型中推断得到的直接基因组值的准确性。
J Dairy Sci. 2010 Nov;93(11):5423-35. doi: 10.3168/jds.2010-3149.
6
Application of neural networks with back-propagation to genome-enabled prediction of complex traits in Holstein-Friesian and German Fleckvieh cattle.基于神经网络的反向传播算法在荷斯坦-弗里森牛和德国弗莱维赫牛基因组特征预测复杂性状中的应用。
Genet Sel Evol. 2015 Mar 31;47(1):22. doi: 10.1186/s12711-015-0097-5.
7
Short communication: relationship of call rate and accuracy of single nucleotide polymorphism genotypes in dairy cattle.简讯:奶牛单核苷酸多态性基因型的呼叫率与准确性的关系。
J Dairy Sci. 2013 May;96(5):3336-9. doi: 10.3168/jds.2012-6208. Epub 2013 Mar 15.
8
Improving accuracy of genomic predictions within and between dairy cattle breeds with imputed high-density single nucleotide polymorphism panels.利用高分辨率单核苷酸多态性面板提高奶牛品种内和品种间基因组预测的准确性。
J Dairy Sci. 2012 Jul;95(7):4114-29. doi: 10.3168/jds.2011-5019.
9
Predicting haplotype carriers from SNP genotypes in Bos taurus through linear discriminant analysis.通过线性判别分析从黄牛的单核苷酸多态性(SNP)基因型预测单倍型携带者
Genet Sel Evol. 2015 Feb 5;47(1):4. doi: 10.1186/s12711-015-0094-8.
10
Semi-supervised learning for genomic prediction of novel traits with small reference populations: an application to residual feed intake in dairy cattle.利用小参考群体对新性状进行基因组预测的半监督学习:在奶牛剩余采食量中的应用
Genet Sel Evol. 2016 Nov 7;48(1):84. doi: 10.1186/s12711-016-0262-5.

引用本文的文献

1
Using visual scores for genomic prediction of complex traits in breeding programs.利用可视评分进行复杂性状的基因组预测在育种计划中的应用。
Theor Appl Genet. 2023 Dec 15;137(1):9. doi: 10.1007/s00122-023-04512-w.
2
Comparison of machine learning methods to predict udder health status based on somatic cell counts in dairy cows.基于体细胞计数的奶牛乳房健康状况的机器学习方法比较。
Sci Rep. 2021 Jul 1;11(1):13642. doi: 10.1038/s41598-021-93056-4.
3
Quantitative Trait Loci Mapping for Lameness Associated Phenotypes in Holstein-Friesian Dairy Cattle.

本文引用的文献

1
Use of SNP genotypes to identify carriers of harmful recessive mutations in cattle populations.利用单核苷酸多态性(SNP)基因型鉴定牛群中有害隐性突变的携带者。
BMC Genomics. 2016 Nov 3;17(1):857. doi: 10.1186/s12864-016-3218-9.
2
"Noisy beets": impact of phenotyping errors on genomic predictions for binary traits in Beta vulgaris.“嘈杂的甜菜”:表型错误对甜菜二元性状基因组预测的影响
Plant Methods. 2016 Jul 18;12:36. doi: 10.1186/s13007-016-0136-4. eCollection 2016.
3
A missense mutation in TUBD1 is associated with high juvenile mortality in Braunvieh and Fleckvieh cattle.
荷斯坦-弗里生奶牛跛行相关表型的数量性状基因座定位
Front Genet. 2019 Oct 4;10:926. doi: 10.3389/fgene.2019.00926. eCollection 2019.
4
A single nucleotide polymorphism panel for individual identification and ancestry assignment in Caucasians and four East and Southeast Asian populations using a machine learning classifier.使用机器学习分类器的单核苷酸多态性面板用于白种人和四个东亚及东南亚人群的个体识别和血统归属。
Forensic Sci Med Pathol. 2019 Mar;15(1):67-74. doi: 10.1007/s12024-018-0071-y. Epub 2019 Jan 16.
TUBD1基因中的一个错义突变与西门塔尔牛和弗莱维赫牛的高幼年死亡率相关。
BMC Genomics. 2016 May 25;17:400. doi: 10.1186/s12864-016-2742-y.
4
Machine learning applications in genetics and genomics.机器学习在遗传学和基因组学中的应用。
Nat Rev Genet. 2015 Jun;16(6):321-32. doi: 10.1038/nrg3920. Epub 2015 May 7.
5
Predicting haplotype carriers from SNP genotypes in Bos taurus through linear discriminant analysis.通过线性判别分析从黄牛的单核苷酸多态性(SNP)基因型预测单倍型携带者
Genet Sel Evol. 2015 Feb 5;47(1):4. doi: 10.1186/s12711-015-0094-8.
6
Machine learning in cell biology - teaching computers to recognize phenotypes.细胞生物学中的机器学习——教计算机识别细胞表型
J Cell Sci. 2013 Dec 15;126(Pt 24):5529-39. doi: 10.1242/jcs.123604. Epub 2013 Nov 20.
7
Imputation of unordered markers and the impact on genomic selection accuracy.无序标记的推断及其对基因组选择准确性的影响。
G3 (Bethesda). 2013 Mar;3(3):427-39. doi: 10.1534/g3.112.005363. Epub 2013 Mar 1.
8
Chapter 11: Genome-wide association studies.第十一章:全基因组关联研究。
PLoS Comput Biol. 2012;8(12):e1002822. doi: 10.1371/journal.pcbi.1002822. Epub 2012 Dec 27.
9
Effect of imputing markers from a low-density chip on the reliability of genomic breeding values in Holstein populations.低密度芯片标记信息导入对荷斯坦牛群体基因组育种值可靠性的影响。
J Dairy Sci. 2011 Jul;94(7):3679-86. doi: 10.3168/jds.2011-4299.
10
Accuracy of direct genomic values derived from imputed single nucleotide polymorphism genotypes in Jersey cattle.从泽西牛的单核苷酸多态性基因型中推断得到的直接基因组值的准确性。
J Dairy Sci. 2010 Nov;93(11):5423-35. doi: 10.3168/jds.2010-3149.