Suppr超能文献

医学语言模型中认知偏差的评估与缓解

Evaluation and mitigation of cognitive biases in medical language models.

作者信息

Schmidgall Samuel, Harris Carl, Essien Ime, Olshvang Daniel, Rahman Tawsifur, Kim Ji Woong, Ziaei Rojin, Eshraghian Jason, Abadir Peter, Chellappa Rama

机构信息

Department of Electrical and Computer Engineering, Johns Hopkins University, Baltimore, MD, USA.

Department of Biomedical Engineering, Johns Hopkins University, Baltimore, MD, USA.

出版信息

NPJ Digit Med. 2024 Oct 21;7(1):295. doi: 10.1038/s41746-024-01283-6.

Abstract

Increasing interest in applying large language models (LLMs) to medicine is due in part to their impressive performance on medical exam questions. However, these exams do not capture the complexity of real patient-doctor interactions because of factors like patient compliance, experience, and cognitive bias. We hypothesized that LLMs would produce less accurate responses when faced with clinically biased questions as compared to unbiased ones. To test this, we developed the BiasMedQA dataset, which consists of 1273 USMLE questions modified to replicate common clinically relevant cognitive biases. We assessed six LLMs on BiasMedQA and found that GPT-4 stood out for its resilience to bias, in contrast to Llama 2 70B-chat and PMC Llama 13B, which showed large drops in performance. Additionally, we introduced three bias mitigation strategies, which improved but did not fully restore accuracy. Our findings highlight the need to improve LLMs' robustness to cognitive biases, in order to achieve more reliable applications of LLMs in healthcare.

摘要

将大语言模型(LLMs)应用于医学领域的兴趣日益浓厚,部分原因在于它们在医学考试问题上的出色表现。然而,由于患者依从性、经验和认知偏差等因素,这些考试无法体现真实医患互动的复杂性。我们推测,与无偏差问题相比,当面对存在临床偏差的问题时,大语言模型会给出准确性较低的回答。为了验证这一点,我们开发了BiasMedQA数据集,该数据集由1273道美国医师执照考试(USMLE)问题组成,这些问题经过修改以复制常见的临床相关认知偏差。我们在BiasMedQA上评估了六个大语言模型,发现GPT-4在抗偏差方面表现突出,而Llama 2 70B-chat和PMC Llama 13B的性能则大幅下降。此外,我们引入了三种偏差缓解策略,这些策略提高了准确性,但并未完全恢复。我们的研究结果凸显了提高大语言模型对认知偏差的鲁棒性的必要性,以便在医疗保健领域更可靠地应用大语言模型。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5cc6/11494053/0370b6c24192/41746_2024_1283_Fig1_HTML.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验