Lee Mingyung, Kim Dong Hyeon, Seo Seongwon, Tedeschi Luis O
Department of Animal Science, Texas A&M University, College Station, TX 77843-2471, USA.
Dairy Science Division, National Institute of Animal Science, Rural Development Administration, Cheonan 31000, Republic of Korea.
Animals (Basel). 2025 Jul 18;15(14):2127. doi: 10.3390/ani15142127.
A reliable estimation of protein requirements in lactating dairy cows is necessary for formulating nutritionally adequate diets, improving feed efficiency, and minimizing nitrogen excretion. This study aimed to develop machine learning-based models to predict net protein requirements for maintenance (NPm) and lactation (NPl) using random forest regression (RFR) and support vector regression (SVR). A total of 1779 observations were assembled from 436 peer-reviewed publications and open-access databases. Predictor variables included farm-ready variables such as milk yield, dry matter intake, days in milk, body weight, and dietary crude protein content. NPm was estimated based on the National Academies of Sciences, Engineering, and Medicine (NASEM, 2021) equations, while NPl was derived from milk true protein yield. The model adequacy was evaluated using 10-fold cross-validation. The RFR model demonstrated higher predictive performance than SVR for both NPm (R = 0.82, RMSEP = 22.38 g/d, CCC = 0.89) and NPl (R = 0.82, RMSEP = 95.17 g/d, CCC = 0.89), reflecting its capacity to model the rule-based nature of the NASEM equations. These findings suggest that RFR may provide a valuable approach for estimating protein requirements with fewer input variables. Further research should focus on validating these models under field conditions and exploring hybrid modeling frameworks that integrate mechanistic and machine learning approaches.
准确估计泌乳奶牛的蛋白质需求对于制定营养充足的日粮、提高饲料效率以及最大限度地减少氮排泄至关重要。本研究旨在开发基于机器学习的模型,使用随机森林回归(RFR)和支持向量回归(SVR)来预测维持净蛋白质需求(NPm)和泌乳净蛋白质需求(NPl)。总共从436篇同行评审出版物和开放获取数据库中收集了1779条观测数据。预测变量包括农场可用变量,如产奶量、干物质摄入量、泌乳天数、体重和日粮粗蛋白含量。NPm是根据美国国家科学院、工程院和医学院(NASEM,2021)的公式估算的,而NPl则来自牛奶真蛋白产量。使用10折交叉验证评估模型的适用性。对于NPm(R = 0.82,RMSEP = 22.38 g/d,CCC = 0.89)和NPl(R = 0.82,RMSEP = 95.17 g/d,CCC = 0.89),RFR模型均表现出比SVR更高的预测性能,这反映了其对NASEM公式基于规则性质进行建模的能力。这些发现表明,RFR可能为用较少输入变量估计蛋白质需求提供一种有价值的方法。进一步的研究应侧重于在田间条件下验证这些模型,并探索整合机械和机器学习方法的混合建模框架。