TERRA research and teaching centre, Gembloux Agro-Bio Tech, University of Liège, Gembloux, Belgium.
Valorisation of agricultural products, Walloon Research Centre, Gembloux, Belgium.
J Dairy Sci. 2020 Dec;103(12):11585-11596. doi: 10.3168/jds.2020-18870. Epub 2020 Oct 23.
Lactoferrin (LF) is a glycoprotein naturally present in milk. Its content varies throughout lactation, but also with mastitis; therefore it is a potential additional indicator of udder health beyond somatic cell count. Condequently, there is an interest in quantifying this biomolecule routinely. First prediction equations proposed in the literature to predict the content in milk using milk mid-infrared spectrometry were built using partial least square regression (PLSR) due to the limited size of the data set. Thanks to a large data set, the current study aimed to test 4 different machine learning algorithms using a large data set comprising 6,619 records collected across different herds, breeds, and countries. The first algorithm was a PLSR, as used in past investigations. The second and third algorithms used partial least square (PLS) factors combined with a linear and polynomial support vector regression (PLS + SVR). The fourth algorithm also used PLS factors, but included in an artificial neural network with 1 hidden layer (PLS + ANN). The training and validation sets comprised 5,541 and 836 records, respectively. Even if the calibration prediction performances were the best for PLS + polynomial SVR, their validation prediction performances were the worst. The 3 other algorithms had similar validation performances. Indeed, the validation root mean squared error (RMSE) ranged between 162.17 and 166.75 mg/L of milk. However, the lower standard deviation of cross-validation RMSE and the better normality of the residual distribution observed for PLS + ANN suggest that this modeling was more suitable to predict the LF content in milk from milk mid-infrared spectra (Rv = 0.60 and validation RMSE = 162.17 mg/L of milk). This PLS +ANN model was then applied to almost 6 million spectral records. The predicted LF showed the expected relationships with milk yield, somatic cell score, somatic cell count, and stage of lactation. The model tended to underestimate high LF values (higher than 600 mg/L of milk). However, if the prediction threshold was set to 500 mg/L, 82% of samples from the validation having a content of LF higher than 600 mg/L were detected. Future research should aim to increase the number of those extremely high LF records in the calibration set.
乳铁蛋白(LF)是一种天然存在于牛奶中的糖蛋白。其含量在整个泌乳期都有变化,但也与乳腺炎有关;因此,它是除体细胞计数外评估乳房健康的潜在附加指标。因此,人们有兴趣常规定量这种生物分子。文献中首次提出的使用牛奶中红外光谱预测牛奶中 LF 含量的预测方程是使用偏最小二乘回归(PLSR)建立的,因为数据集的规模有限。由于数据集较大,本研究旨在使用包含来自不同牛群、品种和国家的 6619 条记录的大型数据集,测试 4 种不同的机器学习算法。第一种算法是过去研究中使用的 PLSR。第二种和第三种算法使用偏最小二乘(PLS)因子与线性和多项式支持向量回归(PLS + SVR)相结合。第四种算法也使用 PLS 因子,但包含在具有 1 个隐藏层的人工神经网络中(PLS + ANN)。训练集和验证集分别包含 5541 条和 836 条记录。尽管 PLS + 多项式 SVR 的校准预测性能最佳,但它们的验证预测性能最差。其余 3 种算法的验证性能相似。实际上,验证均方根误差(RMSE)范围在 162.17 到 166.75mg/L 之间。然而,PLS + ANN 的交叉验证 RMSE 的标准偏差更低,残差分布更接近正态,这表明该模型更适合从牛奶中红外光谱预测 LF 含量(Rv=0.60,验证 RMSE=162.17mg/L)。然后将该 PLS + ANN 模型应用于近 600 万条光谱记录。预测的 LF 与牛奶产量、体细胞评分、体细胞计数和泌乳阶段呈现出预期的关系。该模型倾向于低估 LF 值较高(高于 600mg/L)的情况。然而,如果将预测阈值设置为 500mg/L,则可以检测到验证集中 82%的 LF 含量高于 600mg/L 的样本。未来的研究应旨在增加校准集中那些极高 LF 记录的数量。