Ponzo Valentina, Rosato Rosalba, Scigliano Maria Carmine, Onida Martina, Cossai Simona, De Vecchi Morena, Devecchi Andrea, Goitre Ilaria, Favaro Enrica, Merlo Fabio Dario, Sergi Domenico, Bo Simona
Department of Medical Science, University of Turin, 10126 Torino, Italy.
Department of Psychology, University of Turin, 10124 Torino, Italy.
J Clin Med. 2024 Dec 20;13(24):7810. doi: 10.3390/jcm13247810.
: The use of artificial intelligence (AI) chatbots for obtaining healthcare advice is greatly increased in the general population. This study assessed the performance of general-purpose AI chatbots in giving nutritional advice for patients with obesity with or without multiple comorbidities. : The case of a 35-year-old male with obesity without comorbidities (Case 1), and the case of a 65-year-old female with obesity, type 2 diabetes mellitus, sarcopenia, and chronic kidney disease (Case 2) were submitted to 10 different AI chatbots on three consecutive days. Accuracy (the ability to provide advice aligned with guidelines), completeness, and reproducibility (replicability of the information over the three days) of the chatbots' responses were evaluated by three registered dietitians. Nutritional consistency was evaluated by comparing the nutrient content provided by the chatbots with values calculated by dietitians. : Case 1: ChatGPT 3.5 demonstrated the highest accuracy rate (67.2%) and Copilot the lowest (21.1%). ChatGPT 3.5 and ChatGPT 4.0 achieved the highest completeness (both 87.3%), whereas Gemini and Copilot recorded the lowest scores (55.6%, 42.9%, respectively). Reproducibility was highest for Chatsonic (86.1%) and lowest for ChatGPT 4.0 (50%) and ChatGPT 3.5 (52.8%). Case 2: Overall accuracy was low, with no chatbot achieving 50% accuracy. Completeness was highest for ChatGPT 4.0 and Claude (both 77.8%), and lowest for Copilot (23.3%). ChatGPT 4.0 and Pi Ai showed the lowest reproducibility. Major inconsistencies regarded the amount of protein recommended by most chatbots, which suggested simultaneously to both reduce and increase protein intake. General-purpose AI chatbots exhibited limited accuracy, reproducibility, and consistency in giving dietary advice in complex clinical scenarios and cannot replace the work of an expert dietitian.
在普通人群中,使用人工智能(AI)聊天机器人获取医疗保健建议的情况大幅增加。本研究评估了通用AI聊天机器人为患有或未患有多种合并症的肥胖患者提供营养建议的表现。一名35岁无合并症的肥胖男性病例(病例1)和一名65岁患有肥胖、2型糖尿病、肌肉减少症和慢性肾脏病的女性病例(病例2),连续三天提交给10个不同的AI聊天机器人。三名注册营养师评估了聊天机器人回复的准确性(提供符合指南建议的能力)、完整性和可重复性(三天内信息的可复制性)。通过将聊天机器人提供的营养成分与营养师计算的值进行比较来评估营养一致性。病例1:ChatGPT 3.5的准确率最高(67.2%),Copilot最低(21.1%)。ChatGPT 3.5和ChatGPT 4.0的完整性最高(均为87.3%),而Gemini和Copilot得分最低(分别为55.6%、42.9%)。Chatsonic的可重复性最高(86.1%),ChatGPT 4.0(50%)和ChatGPT 3.5(52.8%)最低。病例2:总体准确率较低,没有聊天机器人达到50%的准确率。ChatGPT 4.0和Claude(均为77.8%)的完整性最高,Copilot最低(23.3%)。ChatGPT 4.0和Pi Ai的可重复性最低。主要不一致之处在于大多数聊天机器人推荐的蛋白质量,既建议减少又建议增加蛋白质摄入量。通用AI聊天机器人在复杂临床场景中提供饮食建议时,准确性、可重复性和一致性有限,无法取代专业营养师的工作。