Lamanna M, Muca E, Giannone C, Bovo M, Boffo F, Romanzin A, Cavallini D
Department of Veterinary Medical Sciences, University of Bologna, 40064 Ozzano dell'Emilia (BO), Italy.
Department of Veterinary Sciences, University of Turin, 10095 Grugliasco (TO), Italy.
J Dairy Sci. 2025 Sep;108(9):10203-10219. doi: 10.3168/jds.2025-26385. Epub 2025 Jul 18.
This study investigates the application of ChatGPT-4 in extracting and classifying behavioral data from scientific literature, focusing on the daily time-activity budget of dairy cows. Accurate analysis of time-activity budgets is crucial for understanding dairy cow welfare and productivity. Traditional methods are time-intensive and prone to bias. This study evaluates the accuracy and reliability of ChatGPT-4 in data extraction and data categorization, considering explicit, inferred, and ambiguous labels for the data, compared with human analysis. A collection of 55 papers on dairy cow behavior were used in the studies. Data extraction for eating, ruminating, and lying behaviors was performed manually and via ChatGPT-4. The artificial intelligence (AI) model's accuracy and labeling performance were assessed through descriptive and statistical analyses. Mixed model analysis was used to compare human and AI outcomes. Artificial intelligence and human time budget data showed significant differences for eating and ruminating but not for lying. ChatGPT-4 estimated daily eating time at 22.3% compared with 23.8% by humans. For ruminating, AI reported 33.4% against 31.7% by humans. Daily lying times were nearly identical, with AI at 44.4% and human analysis at 44.2%. The global accuracy in data extraction was ∼75%, and labeling accuracy reached 67.3%, with significant variability across behavioral categories. In general, the AI model demonstrates moderate accuracy in extracting and categorizing behavioral data, particularly for inferred and ambiguous data. However, explicit data extraction posed challenges, highlighting AI's dependence on input quality and structure. The consistency between AI and human analyses for lying behavior underscores AI's potential for specific applications. ChatGPT-4 offers a promising complementary tool for behavioral research, enabling efficient and scalable data extraction. However, improvements in AI algorithms and standardized reporting in scientific literature are essential for broader applicability. The study advocates for hybrid approaches combining AI capabilities with human oversight to enhance the reliability and accuracy of dairy cow behavioral research.
本研究调查了ChatGPT-4在从科学文献中提取和分类行为数据方面的应用,重点关注奶牛的每日时间活动预算。准确分析时间活动预算对于理解奶牛福利和生产力至关重要。传统方法耗时且容易产生偏差。本研究评估了ChatGPT-4在数据提取和数据分类方面的准确性和可靠性,考虑了数据的明确、推断和模糊标签,并与人工分析进行了比较。研究使用了55篇关于奶牛行为的论文。进食、反刍和躺卧行为的数据提取通过人工和ChatGPT-4进行。通过描述性和统计分析评估了人工智能(AI)模型的准确性和标注性能。使用混合模型分析来比较人工和AI的结果。人工智能和人工时间预算数据在进食和反刍方面存在显著差异,但在躺卧方面没有。ChatGPT-4估计每日进食时间为22.3%,而人工估计为23.8%。对于反刍,AI报告为33.4%,而人工为31.7%。每日躺卧时间几乎相同,AI为44.4%,人工分析为44.2%。数据提取的总体准确率约为75%,标注准确率达到67.3%,不同行为类别之间存在显著差异。总体而言,AI模型在提取和分类行为数据方面表现出中等准确性,特别是对于推断和模糊数据。然而,明确的数据提取存在挑战,突出了AI对输入质量和结构的依赖性。AI和人工分析在躺卧行为上的一致性凸显了AI在特定应用中的潜力。ChatGPT-4为行为研究提供了一个有前景的补充工具,能够实现高效且可扩展的数据提取。然而,改进AI算法和科学文献中的标准化报告对于更广泛的适用性至关重要。该研究提倡将AI能力与人工监督相结合的混合方法,以提高奶牛行为研究的可靠性和准确性。