Aiumtrakul Noppawit, Thongprayoon Charat, Arayangkool Chinnawat, Vo Kristine B, Wannaphut Chalothorn, Suppadungsuk Supawadee, Krisanapan Pajaree, Garcia Valencia Oscar A, Qureshi Fawad, Miao Jing, Cheungpasitporn Wisit
Department of Medicine, John A. Burn School of Medicine, University of Hawaii, Honolulu, HI 96813, USA.
Division of Nephrology and Hypertension, Department of Medicine, Mayo Clinic, Rochester, MN 55905, USA.
J Pers Med. 2024 Jan 18;14(1):107. doi: 10.3390/jpm14010107.
Accurate information regarding oxalate levels in foods is essential for managing patients with hyperoxaluria, oxalate nephropathy, or those susceptible to calcium oxalate stones. This study aimed to assess the reliability of chatbots in categorizing foods based on their oxalate content. We assessed the accuracy of ChatGPT-3.5, ChatGPT-4, Bard AI, and Bing Chat to classify dietary oxalate content per serving into low (<5 mg), moderate (5-8 mg), and high (>8 mg) oxalate content categories. A total of 539 food items were processed through each chatbot. The accuracy was compared between chatbots and stratified by dietary oxalate content categories. Bard AI had the highest accuracy of 84%, followed by Bing (60%), GPT-4 (52%), and GPT-3.5 (49%) ( < 0.001). There was a significant pairwise difference between chatbots, except between GPT-4 and GPT-3.5 ( = 0.30). The accuracy of all the chatbots decreased with a higher degree of dietary oxalate content categories but Bard remained having the highest accuracy, regardless of dietary oxalate content categories. There was considerable variation in the accuracy of AI chatbots for classifying dietary oxalate content. Bard AI consistently showed the highest accuracy, followed by Bing Chat, GPT-4, and GPT-3.5. These results underline the potential of AI in dietary management for at-risk patient groups and the need for enhancements in chatbot algorithms for clinical accuracy.
对于管理高草酸尿症、草酸肾病患者或易患草酸钙结石的患者而言,有关食物中草酸盐含量的准确信息至关重要。本研究旨在评估聊天机器人根据食物草酸盐含量对食物进行分类的可靠性。我们评估了ChatGPT-3.5、ChatGPT-4、Bard AI和必应聊天对每份膳食草酸盐含量分类为低(<5毫克)、中(5-8毫克)和高(>8毫克)草酸盐含量类别的准确性。每个聊天机器人共处理了539种食物。比较了聊天机器人之间的准确性,并按膳食草酸盐含量类别进行分层。Bard AI的准确率最高,为84%,其次是必应(60%)、GPT-4(52%)和GPT-3.5(49%)(<0.001)。聊天机器人之间存在显著的两两差异,但GPT-4和GPT-3.5之间除外(=0.30)。所有聊天机器人的准确性随着膳食草酸盐含量类别的升高而降低,但无论膳食草酸盐含量类别如何,Bard的准确性仍然最高。人工智能聊天机器人在分类膳食草酸盐含量方面的准确性存在相当大的差异。Bard AI始终显示出最高的准确性,其次是必应聊天、GPT-4和GPT-3.5。这些结果强调了人工智能在高危患者群体饮食管理中的潜力,以及提高聊天机器人算法临床准确性的必要性。