文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

Comparing the Accuracy of Two Generated Large Language Models in Identifying Health-Related Rumors or Misconceptions and the Applicability in Health Science Popularization: Proof-of-Concept Study.

作者信息

Luo Yuan, Miao Yiqun, Zhao Yuhan, Li Jiawei, Chen Yuling, Yue Yuexue, Wu Ying

机构信息

School of Nursing, Capital Medical University, 10 Xitoutiao, Youanmen Wai, Fengtai District, Beijing, 100069, China, 86 10839117.

School of Nursing, Johns Hopkins University, Baltimore, MD, United States.

出版信息

JMIR Form Res. 2024 Dec 2;8:e63188. doi: 10.2196/63188.


DOI:10.2196/63188
PMID:39622076
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11627524/
Abstract

BACKGROUND: Health-related rumors and misconceptions are spreading at an alarming rate, fueled by the rapid development of the internet and the exponential growth of social media platforms. This phenomenon has become a pressing global concern, as the dissemination of false information can have severe consequences, including widespread panic, social instability, and even public health crises. OBJECTIVE: The aim of the study is to compare the accuracy of rumor identification and the effectiveness of health science popularization between 2 generated large language models in Chinese (GPT-4 by OpenAI and Enhanced Representation through Knowledge Integration Bot [ERNIE Bot] 4.0 by Baidu). METHODS: In total, 20 health rumors and misconceptions, along with 10 health truths, were randomly inputted into GPT-4 and ERNIE Bot 4.0. We prompted them to determine whether the statements were rumors or misconceptions and provide explanations for their judgment. Further, we asked them to generate a health science popularization essay. We evaluated the outcomes in terms of accuracy, effectiveness, readability, and applicability. Accuracy was assessed by the rate of correctly identifying health-related rumors, misconceptions, and truths. Effectiveness was determined by the accuracy of the generated explanation, which was assessed collaboratively by 2 research team members with a PhD in nursing. Readability was calculated by the readability formula of Chinese health education materials. Applicability was evaluated by the Chinese Suitability Assessment of Materials. RESULTS: GPT-4 and ERNIE Bot 4.0 correctly identified all health rumors and misconceptions (100% accuracy rate). For truths, the accuracy rate was 70% (7/10) and 100% (10/10), respectively. Both mostly provided widely recognized viewpoints without obvious errors. The average readability score for the health essays was 2.92 (SD 0.85) for GPT-4 and 3.02 (SD 0.84) for ERNIE Bot 4.0 (P=.65). For applicability, except for the content and cultural appropriateness category, significant differences were observed in the total score and scores in other dimensions between them (P<.05). CONCLUSIONS: ERNIE Bot 4.0 demonstrated similar accuracy to GPT-4 in identifying Chinese rumors. Both provided widely accepted views, despite some inaccuracies. These insights enhance understanding and correct misunderstandings. For health essays, educators can learn from readable language styles of GLLMs. Finally, ERNIE Bot 4.0 aligns with Chinese expression habits, making it a good choice for a better Chinese reading experience.

摘要
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e16d/11627524/cca1ec80ef6d/formative-v8-e63188-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e16d/11627524/cca1ec80ef6d/formative-v8-e63188-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e16d/11627524/cca1ec80ef6d/formative-v8-e63188-g001.jpg

相似文献

[1]
Comparing the Accuracy of Two Generated Large Language Models in Identifying Health-Related Rumors or Misconceptions and the Applicability in Health Science Popularization: Proof-of-Concept Study.

JMIR Form Res. 2024-12-2

[2]
A Comparative Analysis of GPT-4o and ERNIE Bot in a Chinese Radiation Oncology Exam.

J Cancer Educ. 2025-5-26

[3]
Physician Versus Large Language Model Chatbot Responses to Web-Based Questions From Autistic Patients in Chinese: Cross-Sectional Comparative Analysis.

J Med Internet Res. 2024-4-30

[4]
Identification of Online Health Information Using Large Pretrained Language Models: Mixed Methods Study.

J Med Internet Res. 2025-5-14

[5]
Comparing the performance of ChatGPT and ERNIE Bot in answering questions regarding liver cancer interventional radiology in Chinese and English contexts: A comparative study.

Digit Health. 2025-1-23

[6]
Comparative performance analysis of global and chinese-domain large language models for myopia.

Eye (Lond). 2025-4-13

[7]
Evaluating the Effectiveness of Large Language Models in Providing Patient Education for Chinese Patients With Ocular Myasthenia Gravis: Mixed Methods Study.

J Med Internet Res. 2025-4-10

[8]
Comparison of artificial intelligence-generated and physician-generated patient education materials on early diabetic kidney disease.

Front Endocrinol (Lausanne). 2025-4-22

[9]
The performance of ChatGPT and ERNIE Bot in surgical resident examinations.

Int J Med Inform. 2025-8

[10]
Clinical Management of Wasp Stings Using Large Language Models: Cross-Sectional Evaluation Study.

J Med Internet Res. 2025-6-4

引用本文的文献

[1]
Assessing the accuracy and explainability of using ChatGPT to evaluate the quality of health news.

BMC Public Health. 2025-6-2

[2]
A Comparative Analysis of GPT-4o and ERNIE Bot in a Chinese Radiation Oncology Exam.

J Cancer Educ. 2025-5-26

本文引用的文献

[1]
Assessing ChatGPT's Mastery of Bloom's Taxonomy Using Psychosomatic Medicine Exam Questions: Mixed-Methods Study.

J Med Internet Res. 2024-1-23

[2]
Artificial intelligence in global health equity: an evaluation and discussion on the application of ChatGPT, in the Chinese National Medical Licensing Examination.

Front Med (Lausanne). 2023-10-19

[3]
Hot Topic Recognition of Health Rumors Based on Anti-Rumor Articles on the WeChat Official Account Platform: Topic Modeling.

J Med Internet Res. 2023-9-21

[4]
Using ChatGPT and Google Bard to improve the readability of written patient information: a proof of concept.

Eur J Cardiovasc Nurs. 2024-3-12

[5]
Performance and exploration of ChatGPT in medical examination, records and education in Chinese: Pave the way for medical AI.

Int J Med Inform. 2023-9

[6]
Use of ChatGPT, GPT-4, and Bard to Improve Readability of ChatGPT's Answers to Common Questions About Lung Cancer and Lung Cancer Screening.

AJR Am J Roentgenol. 2023-11

[7]
Indicators of trustworthiness in lay-friendly research summaries: Scientificness surpasses easiness.

Public Underst Sci. 2024-1

[8]
Can ChatGPT Accurately Answer a PICOT Question? Assessing AI Response to a Clinical Question.

Nurse Educ. 2023

[9]
Can Artificial Intelligence Improve the Readability of Patient Education Materials?

Clin Orthop Relat Res. 2023-11-1

[10]
Accuracy of Information and References Using ChatGPT-3 for Retrieval of Clinical Radiological Information.

Can Assoc Radiol J. 2024-2

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索