Wu Xiao-Lin, Miles Asha M, Van Tassell Curtis P, Wiggans George R, Norman H Duane, Baldwin Ransom L, Burchard Javier, Dürr João
Council on Dairy Cattle Breeding, Bowie, MD 20716.
Department of Animal and Dairy Sciences, University of Wisconsin-Madison, Madison, WI 53706.
JDS Commun. 2023 Jul 21;4(5):358-362. doi: 10.3168/jdsc.2022-0343. eCollection 2023 Sep.
This study compared 3 correlational (best prediction, linear regression, and feed-forward neural networks) and 2 causal models (recursive structural equation model and recurrent neural networks) for estimating lactation milk yields. The correlational models assumed associations between test-day milk yields (health conditions), while the casual models postulated unidirectional recursive effects between these test-day variables. Wood lactation curves were used to simulate the data and served as a benchmark model. Individual Wood lactation curves provided an excellent parametric interpretation of lactation dynamics, with their prediction accuracies depending on the coverage of the lactation curve dynamics. Best prediction outperformed other models in the absence of mastitis but was suboptimal when mastitis was present and unaccounted for. Recurrent neural networks yielded the highest accuracy when mastitis was present. Although causal models facilitated the inference about the causality underlying lactation, precisely capturing the causal relationships was challenging because the underlying biology was complex. Misspecification of recursive effects in the recursive structural equation model resulted in a loss of accuracy. Hence, modeling causal relationships does not necessarily guarantee improved accuracies. In practice, a parsimonious model is preferred, balancing model complexity and accuracy. In addition to the choice of statistical models, the proper accounting for factors and covariates affecting milk yields is equally crucial.
本研究比较了3种用于估计泌乳期产奶量的相关模型(最佳预测、线性回归和前馈神经网络)和2种因果模型(递归结构方程模型和递归神经网络)。相关模型假定了测定日产奶量(健康状况)之间的关联,而因果模型则假定这些测定日变量之间存在单向递归效应。伍德泌乳曲线用于模拟数据,并作为基准模型。个体伍德泌乳曲线对泌乳动态提供了出色的参数解释,其预测准确性取决于泌乳曲线动态的覆盖范围。在无乳腺炎的情况下,最佳预测模型优于其他模型,但当存在乳腺炎且未考虑到时,其表现欠佳。当存在乳腺炎时,递归神经网络的准确性最高。尽管因果模型有助于推断泌乳背后的因果关系,但由于潜在生物学机制复杂,精确捕捉因果关系具有挑战性。递归结构方程模型中递归效应的错误设定导致了准确性的损失。因此,建立因果关系模型并不一定能保证提高准确性。在实践中,更倾向于选择一个简约的模型,平衡模型复杂性和准确性。除了统计模型的选择外,正确考虑影响产奶量的因素和协变量同样至关重要。