Department of Comparative Biomedicine and Food Science, University of Padova, 35020, Legnaro, Italy.
Department of Animal Science, North Carolina State University, Raleigh 27695.
J Dairy Sci. 2017 Sep;100(9):7306-7319. doi: 10.3168/jds.2016-12203. Epub 2017 Jun 21.
The objective of this study was to compare the prediction accuracy of 92 infrared prediction equations obtained by different statistical approaches. The predicted traits included fatty acid composition (n = 1,040); detailed protein composition (n = 1,137); lactoferrin (n = 558); pH and coagulation properties (n = 1,296); curd yield and composition obtained by a micro-cheese making procedure (n = 1,177); and Ca, P, Mg, and K contents (n = 689). The statistical methods used to develop the prediction equations were partial least squares regression (PLSR), Bayesian ridge regression, Bayes A, Bayes B, Bayes C, and Bayesian least absolute shrinkage and selection operator. Model performances were assessed, for each trait and model, in training and validation sets over 10 replicates. In validation sets, Bayesian regression models performed significantly better than PLSR for the prediction of 33 out of 92 traits, especially fatty acids, whereas they yielded a significantly lower prediction accuracy than PLSR in the prediction of 8 traits: the percentage of C18:1n-7 trans-9 in fat; the content of unglycosylated κ-casein and its percentage in protein; the content of α-lactalbumin; the percentage of α-casein in protein; and the contents of Ca, P, and Mg. Even though Bayesian methods produced a significant enhancement of model accuracy in many traits compared with PLSR, most variations in the coefficient of determination in validation sets were smaller than 1 percentage point. Over traits, the highest predictive ability was obtained by Bayes C even though most of the significant differences in accuracy between Bayesian regression models were negligible.
本研究旨在比较 92 种不同统计方法得到的红外预测方程的预测精度。预测的特征包括脂肪酸组成(n = 1040);详细蛋白质组成(n = 1137);乳铁蛋白(n = 558);pH 值和凝结特性(n = 1296);通过微奶酪制作程序获得的凝乳产率和组成(n = 1177);以及 Ca、P、Mg 和 K 含量(n = 689)。用于开发预测方程的统计方法是偏最小二乘回归(PLSR)、贝叶斯岭回归、贝叶斯 A、贝叶斯 B、贝叶斯 C 和贝叶斯最小绝对收缩和选择算子。对于每个特征和模型,在 10 个重复的训练和验证集中评估模型性能。在验证集中,贝叶斯回归模型在预测 92 个特征中的 33 个方面的表现明显优于 PLSR,特别是脂肪酸,而在预测 8 个特征方面的表现明显低于 PLSR:脂肪中 C18:1n-7 反式-9 的百分比;未糖基化 κ-酪蛋白及其在蛋白质中的百分比;α-乳白蛋白含量;蛋白质中α-酪蛋白的百分比;以及 Ca、P 和 Mg 的含量。尽管贝叶斯方法在许多特征方面与 PLSR 相比显著提高了模型精度,但验证集的决定系数变化大多小于 1 个百分点。在所有特征中,贝叶斯 C 获得了最高的预测能力,尽管贝叶斯回归模型之间的准确性差异大多微不足道。