• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用牛奶近红外光谱数据评估机器学习方法和变量选择方法在荷斯坦奶牛中预测难以测量性状的性能。

Evaluating the performance of machine learning methods and variable selection methods for predicting difficult-to-measure traits in Holstein dairy cattle using milk infrared spectral data.

机构信息

Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padova, Viale dell' Università 16, 35020 Legnaro, Italy.

Department of Agronomy, Food, Natural Resources, Animals and Environment, University of Padova, Viale dell' Università 16, 35020 Legnaro, Italy.

出版信息

J Dairy Sci. 2021 Jul;104(7):8107-8121. doi: 10.3168/jds.2020-19861. Epub 2021 Apr 15.

DOI:10.3168/jds.2020-19861
PMID:33865589
Abstract

Fourier-transform infrared (FTIR) spectroscopy is a powerful high-throughput phenotyping tool for predicting traits that are expensive and difficult to measure in dairy cattle. Calibration equations are often developed using standard methods, such as partial least squares (PLS) regression. Methods that employ penalization, rank-reduction, and variable selection, as well as being able to model the nonlinear relations between phenotype and FTIR, might offer improvements in predictive ability and model robustness. This study aimed to compare the predictive ability of 2 machine learning methods, namely random forest (RF) and gradient boosting machine (GBM), and penalized regression against PLS regression for predicting 3 phenotypes differing in terms of biological meaning and relationships with milk composition (i.e., phenotypes measurable directly and not directly in milk, reflecting different biological processes which can be captured using milk spectra) in Holstein-Friesian cattle under 2 cross-validation scenarios. The data set comprised phenotypic information from 471 Holstein-Friesian cows, and 3 target phenotypes were evaluated: (1) body condition score (BCS), (2) blood β-hydroxybutyrate (BHB, mmol/L), and (3) κ-casein expressed as a percentage of nitrogen (κ-CN, % N). The data set was split considering 2 cross-validation scenarios: samples-out random in which the population was randomly split into 10-folds (8-folds for training and 1-fold for validation and testing); and herd/date-out in which the population was randomly assigned to training (70% herd), validation (10%), and testing (20% herd) based on the herd and date in which the samples were collected. The random grid search was performed using the training subset for the hyperparameter optimization and the validation set was used for the generalization of prediction error. The trained model was then used to assess the final prediction in the testing subset. The grid search for penalized regression evidenced that the elastic net (EN) was the best regularization with increase in predictive ability of 5%. The performance of PLS (standard model) was compared against 2 machine learning techniques and penalized regression using 2 cross-validation scenarios. Machine learning methods showed a greater predictive ability for BCS (0.63 for GBM and 0.61 for RF), BHB (0.80 for GBM and 0.79 for RF), and κ-CN (0.81 for GBM and 0.80 for RF) in samples-out cross-validation. Considering a herd/date-out cross-validation these values were 0.58 (GBM and RF) for BCS, 0.73 (GBM and RF) for BHB, and 0.77 (GBM and RF) for κ-CN. The GBM model tended to outperform other methods in predictive ability around 4%, 1%, and 7% for EN, RF, and PLS, respectively. The prediction accuracies of the GBM and RF models were similar, and differed statistically from the PLS model in samples-out random cross-validation. Although, machine learning techniques outperformed PLS in herd/date-out cross-validation, no significant differences were observed in terms of predictive ability due to the large standard deviation observed for predictions. Overall, GBM achieved the highest accuracy of FTIR-based prediction of the different phenotypic traits across the cross-validation scenarios. These results indicate that GBM is a promising method for obtaining more accurate FTIR-based predictions for different phenotypes in dairy cattle.

摘要

傅里叶变换红外(FTIR)光谱学是一种强大的高通量表型分析工具,可用于预测在奶牛中昂贵且难以测量的性状。校准方程通常使用标准方法(如偏最小二乘(PLS)回归)开发。采用惩罚、降秩和变量选择的方法,以及能够模拟表型和 FTIR 之间的非线性关系的方法,可能会提高预测能力和模型稳健性。本研究旨在比较两种机器学习方法(随机森林(RF)和梯度提升机(GBM))和惩罚回归与 PLS 回归在两种交叉验证情况下预测荷斯坦-弗里生奶牛 3 种表型的预测能力,这 3 种表型在生物学意义和与牛奶成分的关系方面存在差异(即,可直接测量和不可直接测量的表型,反映不同的生物学过程,可以使用牛奶光谱来捕获)。数据集包含了 471 头荷斯坦-弗里生奶牛的表型信息,并评估了 3 个目标表型:(1)体况评分(BCS),(2)血液β-羟丁酸(BHB,mmol/L),(3)κ-酪蛋白表示为氮的百分比(κ-CN,% N)。数据集考虑了两种交叉验证情况进行了拆分:样本随机外,群体随机分为 10 折(8 折用于训练,1 折用于验证和测试);以及 herd/date-out,群体根据采集样本的 herd 和 date 随机分配到训练(70% herd)、验证(10%)和测试(20% herd)。使用训练子集进行随机网格搜索以进行超参数优化,并使用验证集进行预测误差的泛化。然后使用训练好的模型在测试子集上评估最终预测。惩罚回归的网格搜索表明,弹性网(EN)是最佳的正则化方法,可提高 5%的预测能力。使用两种交叉验证情况比较了 PLS(标准模型)与两种机器学习技术和惩罚回归的性能。机器学习方法在样本外交叉验证中对 BCS(GBM 为 0.63,RF 为 0.61)、BHB(GBM 为 0.80,RF 为 0.79)和 κ-CN(GBM 为 0.81,RF 为 0.80)具有更高的预测能力。考虑 herd/date-out 交叉验证,这些值分别为 BCS(GBM 和 RF)的 0.58、BHB(GBM 和 RF)的 0.73 和 κ-CN(GBM 和 RF)的 0.77。GBM 模型在预测能力方面的表现优于其他方法,分别提高了 4%、1%和 7%左右,用于 EN、RF 和 PLS。GBM 和 RF 模型的预测准确性相似,在样本外随机交叉验证中与 PLS 模型存在统计学差异。尽管机器学习技术在 herd/date-out 交叉验证中的预测能力优于 PLS,但由于预测的标准偏差较大,因此在预测能力方面没有观察到显著差异。总体而言,GBM 在跨交叉验证情况下实现了不同表型特征的 FTIR 预测的最高准确性。这些结果表明,GBM 是一种很有前途的方法,可以在奶牛中获得更准确的基于 FTIR 的不同表型预测。

相似文献

1
Evaluating the performance of machine learning methods and variable selection methods for predicting difficult-to-measure traits in Holstein dairy cattle using milk infrared spectral data.利用牛奶近红外光谱数据评估机器学习方法和变量选择方法在荷斯坦奶牛中预测难以测量性状的性能。
J Dairy Sci. 2021 Jul;104(7):8107-8121. doi: 10.3168/jds.2020-19861. Epub 2021 Apr 15.
2
Predicting milk protein fractions using infrared spectroscopy and a gradient boosting machine for breeding purposes in Holstein cattle.利用红外光谱和梯度提升机预测荷斯坦奶牛育种用乳蛋白组分
J Dairy Sci. 2023 Mar;106(3):1853-1873. doi: 10.3168/jds.2022-22119. Epub 2023 Jan 27.
3
Comparison of Single-Breed and Multi-Breed Training Populations for Infrared Predictions of Novel Phenotypes in Holstein Cows.用于荷斯坦奶牛新表型红外预测的单品种与多品种训练群体比较
Animals (Basel). 2021 Jul 2;11(7):1993. doi: 10.3390/ani11071993.
4
Real-time milk analysis integrated with stacking ensemble learning as a tool for the daily prediction of cheese-making traits in Holstein cattle.将实时牛奶分析与堆叠集成学习相结合,作为预测荷斯坦奶牛奶酪制作特性的日常工具。
J Dairy Sci. 2022 May;105(5):4237-4255. doi: 10.3168/jds.2021-21426. Epub 2022 Mar 10.
5
Prediction of detailed blood metabolic profile using milk infrared spectra and machine learning methods in dairy cattle.利用牛奶红外光谱和机器学习方法预测奶牛详细的血液代谢谱。
J Dairy Sci. 2023 May;106(5):3321-3344. doi: 10.3168/jds.2022-22454. Epub 2023 Apr 5.
6
Prediction of blood β-hydroxybutyrate content and occurrence of hyperketonemia in early-lactation, pasture-grazed dairy cows using milk infrared spectra.利用牛奶红外光谱预测泌乳早期放牧奶牛血 β-羟丁酸含量和高酮血症的发生。
J Dairy Sci. 2019 Jul;102(7):6466-6476. doi: 10.3168/jds.2018-15988. Epub 2019 May 10.
7
Prediction of body condition in Jersey dairy cattle from 3D-images using machine learning techniques.利用机器学习技术从 3D 图像预测泽西奶牛的体况。
J Anim Sci. 2023 Jan 3;101. doi: 10.1093/jas/skad376.
8
An attempt at predicting blood β-hydroxybutyrate from Fourier-transform mid-infrared spectra of milk using multivariate mixed models in Polish dairy cattle.在波兰奶牛中使用多元混合模型从牛奶的傅里叶变换中红外光谱预测血液β-羟基丁酸的尝试。
J Dairy Sci. 2017 Aug;100(8):6312-6326. doi: 10.3168/jds.2016-12252. Epub 2017 May 30.
9
Predicting blood β-hydroxybutyrate using milk Fourier transform infrared spectrum, milk composition, and producer-reported variables with multiple linear regression, partial least squares regression, and artificial neural network.运用多元线性回归、偏最小二乘法回归和人工神经网络,基于牛奶傅里叶变换近红外光谱、牛奶成分和生产者报告变量预测血 β-羟丁酸。
J Dairy Sci. 2018 May;101(5):4378-4387. doi: 10.3168/jds.2017-14076. Epub 2018 Feb 22.
10
Comparison between genetic parameters of cheese yield and nutrient recovery or whey loss traits measured from individual model cheese-making methods or predicted from unprocessed bovine milk samples using Fourier-transform infrared spectroscopy.通过个体模型奶酪制作方法测得的奶酪产量与营养成分回收率或乳清损失性状的遗传参数之间的比较,或使用傅里叶变换红外光谱从未加工的牛乳样品预测得到的这些参数之间的比较。
J Dairy Sci. 2014 Oct;97(10):6560-72. doi: 10.3168/jds.2014-8309. Epub 2014 Aug 6.

引用本文的文献

1
The Use of Selected Machine Learning Methods in Dairy Cattle Farming: A Review.机器学习方法在奶牛养殖中的应用:综述
Animals (Basel). 2025 Jul 10;15(14):2033. doi: 10.3390/ani15142033.
2
Benchmarking machine learning and parametric methods for genomic prediction of feed efficiency-related traits in Nellore cattle.基于机器学习和参数方法对内罗尔牛饲料效率相关性状的基因组预测进行基准测试。
Sci Rep. 2024 Mar 17;14(1):6404. doi: 10.1038/s41598-024-57234-4.
3
SVR Chemometrics to Quantify β-Lactoglobulin and α-Lactalbumin in Milk Using MIR.
利用中红外光谱的支持向量回归化学计量学方法对牛奶中的β-乳球蛋白和α-乳白蛋白进行定量分析。
Foods. 2024 Jan 3;13(1):166. doi: 10.3390/foods13010166.
4
Phenotypic Analysis of Fourier-Transform Infrared Milk Spectra in Dairy Goats.奶山羊傅里叶变换红外乳光谱的表型分析
Foods. 2023 Feb 14;12(4):807. doi: 10.3390/foods12040807.
5
Novel prediction models for hyperketonemia using bovine milk Fourier-transform infrared spectroscopy.利用牛乳傅里叶变换红外光谱建立酮血症的新型预测模型。
Prev Vet Med. 2023 Apr;213:105860. doi: 10.1016/j.prevetmed.2023.105860. Epub 2023 Jan 25.
6
In-line near-infrared analysis of milk coupled with machine learning methods for the daily prediction of blood metabolic profile in dairy cattle.在线近红外分析牛奶结合机器学习方法,实现奶牛血液代谢谱的日常预测。
Sci Rep. 2022 May 16;12(1):8058. doi: 10.1038/s41598-022-11799-0.
7
Comparison of Single-Breed and Multi-Breed Training Populations for Infrared Predictions of Novel Phenotypes in Holstein Cows.用于荷斯坦奶牛新表型红外预测的单品种与多品种训练群体比较
Animals (Basel). 2021 Jul 2;11(7):1993. doi: 10.3390/ani11071993.