Suppr超能文献

比较两个生成式大语言模型在识别健康相关谣言或误解方面的准确性及其在健康科学普及中的适用性:概念验证研究。

Comparing the Accuracy of Two Generated Large Language Models in Identifying Health-Related Rumors or Misconceptions and the Applicability in Health Science Popularization: Proof-of-Concept Study.

作者信息

Luo Yuan, Miao Yiqun, Zhao Yuhan, Li Jiawei, Chen Yuling, Yue Yuexue, Wu Ying

机构信息

School of Nursing, Capital Medical University, 10 Xitoutiao, Youanmen Wai, Fengtai District, Beijing, 100069, China, 86 10839117.

School of Nursing, Johns Hopkins University, Baltimore, MD, United States.

出版信息

JMIR Form Res. 2024 Dec 2;8:e63188. doi: 10.2196/63188.

Abstract

BACKGROUND

Health-related rumors and misconceptions are spreading at an alarming rate, fueled by the rapid development of the internet and the exponential growth of social media platforms. This phenomenon has become a pressing global concern, as the dissemination of false information can have severe consequences, including widespread panic, social instability, and even public health crises.

OBJECTIVE

The aim of the study is to compare the accuracy of rumor identification and the effectiveness of health science popularization between 2 generated large language models in Chinese (GPT-4 by OpenAI and Enhanced Representation through Knowledge Integration Bot [ERNIE Bot] 4.0 by Baidu).

METHODS

In total, 20 health rumors and misconceptions, along with 10 health truths, were randomly inputted into GPT-4 and ERNIE Bot 4.0. We prompted them to determine whether the statements were rumors or misconceptions and provide explanations for their judgment. Further, we asked them to generate a health science popularization essay. We evaluated the outcomes in terms of accuracy, effectiveness, readability, and applicability. Accuracy was assessed by the rate of correctly identifying health-related rumors, misconceptions, and truths. Effectiveness was determined by the accuracy of the generated explanation, which was assessed collaboratively by 2 research team members with a PhD in nursing. Readability was calculated by the readability formula of Chinese health education materials. Applicability was evaluated by the Chinese Suitability Assessment of Materials.

RESULTS

GPT-4 and ERNIE Bot 4.0 correctly identified all health rumors and misconceptions (100% accuracy rate). For truths, the accuracy rate was 70% (7/10) and 100% (10/10), respectively. Both mostly provided widely recognized viewpoints without obvious errors. The average readability score for the health essays was 2.92 (SD 0.85) for GPT-4 and 3.02 (SD 0.84) for ERNIE Bot 4.0 (P=.65). For applicability, except for the content and cultural appropriateness category, significant differences were observed in the total score and scores in other dimensions between them (P<.05).

CONCLUSIONS

ERNIE Bot 4.0 demonstrated similar accuracy to GPT-4 in identifying Chinese rumors. Both provided widely accepted views, despite some inaccuracies. These insights enhance understanding and correct misunderstandings. For health essays, educators can learn from readable language styles of GLLMs. Finally, ERNIE Bot 4.0 aligns with Chinese expression habits, making it a good choice for a better Chinese reading experience.

摘要

背景

随着互联网的迅速发展和社交媒体平台的指数级增长,与健康相关的谣言和误解正以惊人的速度传播。这一现象已成为全球紧迫关注的问题,因为虚假信息的传播可能会产生严重后果,包括广泛的恐慌、社会不稳定,甚至公共卫生危机。

目的

本研究旨在比较两个生成式中文大语言模型(OpenAI的GPT-4和百度的文心一言4.0)在谣言识别准确性和健康科普有效性方面的表现。

方法

总共将20条健康谣言和误解以及10条健康真相随机输入GPT-4和文心一言4.0。我们要求它们判断这些陈述是谣言还是误解,并为其判断提供解释。此外,我们要求它们生成一篇健康科普文章。我们从准确性、有效性、可读性和适用性方面评估结果。准确性通过正确识别与健康相关的谣言、误解和真相的比例来评估。有效性由生成解释的准确性决定,由两名拥有护理学博士学位的研究团队成员共同评估。可读性通过中国健康教育材料的可读性公式计算。适用性通过材料的中文适用性评估来评价。

结果

GPT-4和文心一言4.0正确识别了所有健康谣言和误解(准确率100%)。对于真相,准确率分别为70%(7/10)和100%(10/10)。两者大多提供了广泛认可的观点,没有明显错误。GPT-4生成的健康文章平均可读性得分为2.92(标准差0.85),文心一言4.0为3.02(标准差0.84)(P = 0.65)。在适用性方面,除了内容和文化适宜性类别外,它们在总分和其他维度的得分上存在显著差异(P < 0.05)。

结论

文心一言4.0在识别中文谣言方面表现出与GPT-4相似的准确性。两者都提供了被广泛接受的观点,尽管存在一些不准确之处。这些见解有助于增进理解并纠正误解。对于健康文章,教育工作者可以借鉴大语言模型的易读语言风格。最后,文心一言4.0符合中文表达习惯,为获得更好的中文阅读体验提供了一个不错的选择。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/e16d/11627524/cca1ec80ef6d/formative-v8-e63188-g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验