• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

对ChatGPT关于淀粉样变性知识的多学科评估:观察性研究。

A Multidisciplinary Assessment of ChatGPT's Knowledge of Amyloidosis: Observational Study.

作者信息

King Ryan C, Samaan Jamil S, Yeo Yee Hui, Peng Yuxin, Kunkel David C, Habib Ali A, Ghashghaei Roxana

机构信息

Division of Cardiology, Department of Medicine, University of California, Irvine Medical Center, Orange, CA, United States.

Karsh Division of Gastroenterology and Hepatology, Department of Medicine, Cedars-Sinai Medical Center, Los Angeles, CA, United States.

出版信息

JMIR Cardio. 2024 Apr 19;8:e53421. doi: 10.2196/53421.

DOI:10.2196/53421
PMID:38640472
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11069089/
Abstract

BACKGROUND

Amyloidosis, a rare multisystem condition, often requires complex, multidisciplinary care. Its low prevalence underscores the importance of efforts to ensure the availability of high-quality patient education materials for better outcomes. ChatGPT (OpenAI) is a large language model powered by artificial intelligence that offers a potential avenue for disseminating accurate, reliable, and accessible educational resources for both patients and providers. Its user-friendly interface, engaging conversational responses, and the capability for users to ask follow-up questions make it a promising future tool in delivering accurate and tailored information to patients.

OBJECTIVE

We performed a multidisciplinary assessment of the accuracy, reproducibility, and readability of ChatGPT in answering questions related to amyloidosis.

METHODS

In total, 98 amyloidosis questions related to cardiology, gastroenterology, and neurology were curated from medical societies, institutions, and amyloidosis Facebook support groups and inputted into ChatGPT-3.5 and ChatGPT-4. Cardiology- and gastroenterology-related responses were independently graded by a board-certified cardiologist and gastroenterologist, respectively, who specialize in amyloidosis. These 2 reviewers (RG and DCK) also graded general questions for which disagreements were resolved with discussion. Neurology-related responses were graded by a board-certified neurologist (AAH) who specializes in amyloidosis. Reviewers used the following grading scale: (1) comprehensive, (2) correct but inadequate, (3) some correct and some incorrect, and (4) completely incorrect. Questions were stratified by categories for further analysis. Reproducibility was assessed by inputting each question twice into each model. The readability of ChatGPT-4 responses was also evaluated using the Textstat library in Python (Python Software Foundation) and the Textstat readability package in R software (R Foundation for Statistical Computing).

RESULTS

ChatGPT-4 (n=98) provided 93 (95%) responses with accurate information, and 82 (84%) were comprehensive. ChatGPT-3.5 (n=83) provided 74 (89%) responses with accurate information, and 66 (79%) were comprehensive. When examined by question category, ChatGTP-4 and ChatGPT-3.5 provided 53 (95%) and 48 (86%) comprehensive responses, respectively, to "general questions" (n=56). When examined by subject, ChatGPT-4 and ChatGPT-3.5 performed best in response to cardiology questions (n=12) with both models producing 10 (83%) comprehensive responses. For gastroenterology (n=15), ChatGPT-4 received comprehensive grades for 9 (60%) responses, and ChatGPT-3.5 provided 8 (53%) responses. Overall, 96 of 98 (98%) responses for ChatGPT-4 and 73 of 83 (88%) for ChatGPT-3.5 were reproducible. The readability of ChatGPT-4's responses ranged from 10th to beyond graduate US grade levels with an average of 15.5 (SD 1.9).

CONCLUSIONS

Large language models are a promising tool for accurate and reliable health information for patients living with amyloidosis. However, ChatGPT's responses exceeded the American Medical Association's recommended fifth- to sixth-grade reading level. Future studies focusing on improving response accuracy and readability are warranted. Prior to widespread implementation, the technology's limitations and ethical implications must be further explored to ensure patient safety and equitable implementation.

摘要

背景

淀粉样变性是一种罕见的多系统疾病,通常需要复杂的多学科护理。其低患病率凸显了努力确保提供高质量患者教育材料以实现更好治疗效果的重要性。ChatGPT(OpenAI)是一种由人工智能驱动的大型语言模型,为患者和医疗服务提供者提供了传播准确、可靠且易于获取的教育资源的潜在途径。其用户友好的界面、引人入胜的对话式回复以及用户提出后续问题的能力,使其成为向患者提供准确和量身定制信息的有前途的未来工具。

目的

我们对ChatGPT在回答与淀粉样变性相关问题时的准确性、可重复性和可读性进行了多学科评估。

方法

总共从医学协会、机构和淀粉样变性脸书支持小组中整理了98个与心脏病学、胃肠病学和神经病学相关的淀粉样变性问题,并输入到ChatGPT-3.5和ChatGPT-4中。与心脏病学和胃肠病学相关的回复分别由一位专门从事淀粉样变性的认证心脏病专家和认证胃肠病专家独立评分。这两位评审员(RG和DCK)也对一般问题进行评分,如有分歧则通过讨论解决。与神经病学相关的回复由一位专门从事淀粉样变性的认证神经科医生(AAH)评分。评审员使用以下评分标准:(1)全面,(2)正确但不充分,(3)部分正确部分错误,(4)完全错误。问题按类别分层以进行进一步分析。通过将每个问题输入每个模型两次来评估可重复性。还使用Python(Python软件基金会)中的Textstat库和R软件(R统计计算基金会)中的Textstat可读性包评估ChatGPT-4回复的可读性。

结果

ChatGPT-4(n=98)提供了93条(95%)信息准确的回复,其中82条(84%)是全面的。ChatGPT-3.5(n=83)提供了74条(89%)信息准确的回复,其中66条(79%)是全面的。按问题类别检查时,ChatGTP-4和ChatGPT-3.5分别对“一般问题”(n=56)提供了53条(95%)和48条(86%)全面回复。按主题检查时,ChatGPT-4和ChatGPT-3.5在回答心脏病学问题(n=12)时表现最佳,两个模型均产生了10条(83%)全面回复。对于胃肠病学(n=15),ChatGPT-4对9条(60%)回复给出了全面评分,ChatGPT-3.5提供了8条(53%)回复。总体而言,ChatGPT-4的98条回复中有96条(98%)、ChatGPT-3.5的83条回复中有73条(88%)是可重复的。ChatGPT-4回复的可读性范围从美国小学10年级到研究生以上水平,平均为15.5(标准差1.9)。

结论

大型语言模型是为患有淀粉样变性的患者提供准确可靠健康信息的有前途的工具。然而,ChatGPT的回复超出了美国医学协会推荐的五至六年级阅读水平。有必要开展未来研究以提高回复的准确性和可读性。在广泛应用之前,必须进一步探索该技术的局限性和伦理影响,以确保患者安全和公平应用。

相似文献

1
A Multidisciplinary Assessment of ChatGPT's Knowledge of Amyloidosis: Observational Study.对ChatGPT关于淀粉样变性知识的多学科评估:观察性研究。
JMIR Cardio. 2024 Apr 19;8:e53421. doi: 10.2196/53421.
2
Performance of ChatGPT on the Chinese Postgraduate Examination for Clinical Medicine: Survey Study.ChatGPT 在临床医学研究生入学考试中的表现:调查研究。
JMIR Med Educ. 2024 Feb 9;10:e48514. doi: 10.2196/48514.
3
ChatGPT's performance in German OB/GYN exams - paving the way for AI-enhanced medical education and clinical practice.ChatGPT在德国妇产科考试中的表现——为人工智能强化医学教育和临床实践铺平道路。
Front Med (Lausanne). 2023 Dec 13;10:1296615. doi: 10.3389/fmed.2023.1296615. eCollection 2023.
4
Evaluating the Efficacy of ChatGPT as a Patient Education Tool in Prostate Cancer: Multimetric Assessment.评估 ChatGPT 在前列腺癌患者教育中的疗效:多指标评估。
J Med Internet Res. 2024 Aug 14;26:e55939. doi: 10.2196/55939.
5
Evaluating the accuracy and readability of ChatGPT in providing parental guidance for adenoidectomy, tonsillectomy, and ventilation tube insertion surgery.评估 ChatGPT 在提供腺样体切除术、扁桃体切除术和通气管插入手术的家长指导方面的准确性和可读性。
Int J Pediatr Otorhinolaryngol. 2024 Jun;181:111998. doi: 10.1016/j.ijporl.2024.111998. Epub 2024 May 31.
6
Assessing the Accuracy of Responses by the Language Model ChatGPT to Questions Regarding Bariatric Surgery.评估语言模型 ChatGPT 对肥胖症手术相关问题回答的准确性。
Obes Surg. 2023 Jun;33(6):1790-1796. doi: 10.1007/s11695-023-06603-5. Epub 2023 Apr 27.
7
Appropriateness of ChatGPT in Answering Heart Failure Related Questions.ChatGPT 在回答心力衰竭相关问题方面的适宜性。
Heart Lung Circ. 2024 Sep;33(9):1314-1318. doi: 10.1016/j.hlc.2024.03.005. Epub 2024 May 31.
8
Is ChatGPT accurate and reliable in answering questions regarding head and neck cancer?ChatGPT在回答有关头颈癌的问题时准确可靠吗?
Front Oncol. 2023 Dec 1;13:1256459. doi: 10.3389/fonc.2023.1256459. eCollection 2023.
9
Assessing ChatGPT's ability to answer questions pertaining to erectile dysfunction: can our patients trust it?评估 ChatGPT 回答与勃起功能障碍相关问题的能力:我们的患者能信任它吗?
Int J Impot Res. 2024 Nov;36(7):734-740. doi: 10.1038/s41443-023-00797-z. Epub 2023 Nov 20.
10
How Does ChatGPT Perform on the United States Medical Licensing Examination (USMLE)? The Implications of Large Language Models for Medical Education and Knowledge Assessment.ChatGPT在美国医师执照考试(USMLE)中的表现如何?大语言模型对医学教育和知识评估的影响。
JMIR Med Educ. 2023 Feb 8;9:e45312. doi: 10.2196/45312.

引用本文的文献

1
Improving the Readability of Institutional Heart Failure-Related Patient Education Materials Using GPT-4: Observational Study.使用GPT-4提高机构性心力衰竭相关患者教育材料的可读性:观察性研究
JMIR Cardio. 2025 Jul 8;9:e68817. doi: 10.2196/68817.
2
Accuracy of Large Language Models When Answering Clinical Research Questions: Systematic Review and Network Meta-Analysis.大型语言模型回答临床研究问题的准确性:系统评价与网络荟萃分析
J Med Internet Res. 2025 Apr 30;27:e64486. doi: 10.2196/64486.
3
Examining the Accuracy and Reproducibility of Responses to Nutrition Questions Related to Inflammatory Bowel Disease by Generative Pre-trained Transformer-4.通过生成式预训练变换器-4检验与炎症性肠病相关营养问题回答的准确性和可重复性。
Crohns Colitis 360. 2025 Feb 19;7(1):otae077. doi: 10.1093/crocol/otae077. eCollection 2025 Jan.
4
Large language models in patient education: a scoping review of applications in medicine.用于患者教育的大语言模型:医学应用的范围综述
Front Med (Lausanne). 2024 Oct 29;11:1477898. doi: 10.3389/fmed.2024.1477898. eCollection 2024.
5
Evaluation of the quality and readability of ChatGPT responses to frequently asked questions about myopia in traditional Chinese language.评估ChatGPT对中文常见近视相关问题的回答质量和可读性。
Digit Health. 2024 Sep 2;10:20552076241277021. doi: 10.1177/20552076241277021. eCollection 2024 Jan-Dec.
6
Assessing ChatGPT's Competency in Addressing Interdisciplinary Inquiries on Chatbot Uses in Sports Rehabilitation: Simulation Study.评估 ChatGPT 在解决有关运动康复中聊天机器人使用的跨学科问题方面的能力:模拟研究。
JMIR Med Educ. 2024 Aug 7;10:e51157. doi: 10.2196/51157.

本文引用的文献

1
Evaluation of the reliability and readability of ChatGPT-4 responses regarding hypothyroidism during pregnancy.评估 ChatGPT-4 在妊娠期间甲状腺功能减退症相关问题的回复的可靠性和可读性。
Sci Rep. 2024 Jan 2;14(1):243. doi: 10.1038/s41598-023-50884-w.
2
GPT-4V passes the BLS and ACLS examinations: An analysis of GPT-4V's image recognition capabilities.GPT-4V通过基础生命支持(BLS)和高级心血管生命支持(ACLS)考试:对GPT-4V图像识别能力的分析。
Resuscitation. 2024 Feb;195:110106. doi: 10.1016/j.resuscitation.2023.110106. Epub 2023 Dec 29.
3
Assessing the potential of GPT-4 to perpetuate racial and gender biases in health care: a model evaluation study.评估 GPT-4 在医疗保健中延续种族和性别偏见的潜力:一项模型评估研究。
Lancet Digit Health. 2024 Jan;6(1):e12-e22. doi: 10.1016/S2589-7500(23)00225-X.
4
Evaluating the Efficacy of ChatGPT in Navigating the Spanish Medical Residency Entrance Examination (MIR): Promising Horizons for AI in Clinical Medicine.评估ChatGPT在应对西班牙医学住院医师入学考试(MIR)中的效果:人工智能在临床医学中的广阔前景。
Clin Pract. 2023 Nov 20;13(6):1460-1487. doi: 10.3390/clinpract13060130.
5
Large language models propagate race-based medicine.大语言模型传播基于种族的医学观念。
NPJ Digit Med. 2023 Oct 20;6(1):195. doi: 10.1038/s41746-023-00939-z.
6
ChatGPT's ability to comprehend and answer cirrhosis related questions in Arabic.ChatGPT理解并回答阿拉伯语中与肝硬化相关问题的能力。
Arab J Gastroenterol. 2023 Aug;24(3):145-148. doi: 10.1016/j.ajg.2023.08.001. Epub 2023 Sep 4.
7
Success of ChatGPT, an AI language model, in taking the French language version of the European Board of Ophthalmology examination: A novel approach to medical knowledge assessment.ChatGPT 人工智能语言模型成功通过欧洲眼科委员会法语考试:医学知识评估的新方法。
J Fr Ophtalmol. 2023 Sep;46(7):706-711. doi: 10.1016/j.jfo.2023.05.006. Epub 2023 Aug 1.
8
Appropriateness and Readability of ChatGPT-4-Generated Responses for Surgical Treatment of Retinal Diseases.ChatGPT-4 生成的回复在视网膜疾病手术治疗中的适宜性和可读性。
Ophthalmol Retina. 2023 Oct;7(10):862-868. doi: 10.1016/j.oret.2023.05.022. Epub 2023 Jun 3.
9
Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum.比较医生和人工智能聊天机器人对发布在公共社交媒体论坛上的患者问题的回复。
JAMA Intern Med. 2023 Jun 1;183(6):589-596. doi: 10.1001/jamainternmed.2023.1838.
10
Assessing the Accuracy of Responses by the Language Model ChatGPT to Questions Regarding Bariatric Surgery.评估语言模型 ChatGPT 对肥胖症手术相关问题回答的准确性。
Obes Surg. 2023 Jun;33(6):1790-1796. doi: 10.1007/s11695-023-06603-5. Epub 2023 Apr 27.