Suppr超能文献

大语言模型在自我护理中的作用:一项关于药物和补充剂指导准确性的研究及基准测试

The role of large language models in self-care: a study and benchmark on medicines and supplement guidance accuracy.

作者信息

De Busser Branco, Roth Lynn, De Loof Hans

机构信息

Laboratory of Physiopharmacology, University of Antwerp, Universiteitsplein 1, 2610, Antwerp, Belgium.

出版信息

Int J Clin Pharm. 2024 Dec 7. doi: 10.1007/s11096-024-01839-2.

Abstract

BACKGROUND

The recent surge in the capabilities of artificial intelligence systems, particularly large language models, is also impacting the medical and pharmaceutical field in a major way. Beyond specialized uses in diagnostics and data discovery, these tools have now become accessible to the general public.

AIM

The study aimed to critically analyse the current performance of large language models in answering patient's self-care questions regarding medications and supplements.

METHOD

Answers from six major language models were analysed for correctness, language-independence, context-sensitivity, and reproducibility using a newly developed reference set of questions and a scoring matrix.

RESULTS

The investigated large language models are capable of answering a clear majority of self-care questions accurately, providing relevant health information. However, substantial variability in the responses, including potentially unsafe advice, was observed, influenced by language, question structure, user context and time. GPT 4.0 scored highest on average, while GPT 3.5, Gemini, and Gemini Advanced had varied scores. Responses were context and language sensitive. In terms of consistency over time, Perplexity had the worst performance.

CONCLUSION

Given the high-quality output of large language models, their potential in self-care applications is undeniable. The newly created benchmark can facilitate further validation and guide the establishment of strict safeguards to combat the sizable risk of misinformation in order to reach a more favourable risk/benefit ratio when this cutting-edge technology is used by patients.

摘要

背景

人工智能系统能力的近期激增,尤其是大语言模型,也正在对医学和制药领域产生重大影响。除了在诊断和数据发现方面的专门应用外,这些工具现在已向公众开放。

目的

本研究旨在批判性地分析大语言模型在回答患者关于药物和补充剂的自我护理问题时的当前表现。

方法

使用新开发的问题参考集和评分矩阵,分析了六个主要语言模型的答案的正确性、语言独立性、上下文敏感性和可重复性。

结果

所研究的大语言模型能够准确回答绝大多数自我护理问题,提供相关健康信息。然而,观察到回答存在很大差异,包括潜在的不安全建议,这受到语言、问题结构、用户背景和时间的影响。GPT 4.0平均得分最高,而GPT 3.5、Gemini和Gemini Advanced的得分各不相同。回答对上下文和语言敏感。就随时间的一致性而言,Perplexity的表现最差。

结论

鉴于大语言模型的高质量输出,它们在自我护理应用中的潜力不可否认。新创建的基准可以促进进一步验证,并指导建立严格的保障措施,以应对错误信息带来的巨大风险,从而在患者使用这项前沿技术时实现更有利的风险/收益比。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验