• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

自然语言处理器对寻求腹股沟疝信息患者的准确性。

Accuracy of natural language processors for patients seeking inguinal hernia information.

作者信息

Lois Alex, Yates Robert, Ivy Megan, Inaba Colette, Tatum Roger, Cetrulo Lawrence, Parr Zoe, Chen Judy, Khandelwal Saurabh, Wright Andrew

机构信息

Department of Surgery, University of Chicago, 5841 S. Maryland, MC 5095, Chicago, IL, 60637, USA.

Department of Surgery, University of Washington Medical Center, University of Washington, 1959 NE Pacific St, Box 356410, Seattle, WA, 98195, USA.

出版信息

Surg Endosc. 2024 Dec;38(12):7409-7415. doi: 10.1007/s00464-024-11221-y. Epub 2024 Oct 23.

DOI:10.1007/s00464-024-11221-y
PMID:39443381
Abstract

BACKGROUND

NLPs such as ChatGPT are novel sources of online healthcare information that are readily accessible and integrated into internet search tools. The accuracy of NLP-generated responses to health information questions is unknown.

METHODS

We queried four NLPs (ChatGPT 3.5 and 4, Bard, and Claude 2.0) for responses to simulated patient questions about inguinal hernias and their management. Responses were graded on a Likert scale (1 poor to 5 excellent) for relevance, completeness, and accuracy. Responses were compiled and scored collectively for readability using the Flesch-Kincaid score and for educational quality using the DISCERN instrument, a validated tool for evaluating patient information materials. Responses were also compared to two gold-standard educational materials provided by SAGES and the ACS. Evaluations were performed by six hernia surgeons.

RESULTS

The average NLP response scores for relevance, completeness, and accuracy were 4.76 (95% CI 4.70-4.80), 4.11 (95% CI 4.02-4.20), and 4.14 (95% CI 4.03-4.24), respectively. ChatGPT4 received higher accuracy scores (mean 4.43 [95% CI 4.37-4.50]) than Bard (mean 4.06 [95% CI 3.88-4.26]) and Claude 2.0 (mean 3.85 [95% CI 3.63-4.08]). The ACS document received the best scores for reading ease (55.2) and grade level (9.2); however, none of the documents achieved the readibility thresholds recommended by the American Medical Association. The ACS document also received the highest DISCERN score of 63.5 (57.0-70.1), and this was significantly higher compared to ChatGPT 4 (50.8 [95% CI 46.2-55.4]) and Claude 2.0 (48 [95% CI 41.6-54.4]).

CONCLUSIONS

The evaluated NLPs provided relevant responses of reasonable accuracy to questions about inguinal hernia. Compiled NLP responses received relatively low readability and DISCERN scores, although results may improve as NLPs evolve or with adjustments in question wording. As surgical patients expand their use of NLPs for healthcare information, surgeons should be aware of the benefits and limitations of NLPs as patient education tools.

摘要

背景

诸如ChatGPT之类的自然语言处理程序是在线医疗保健信息的新来源,易于获取并集成到互联网搜索工具中。自然语言处理程序生成的健康信息问题回复的准确性尚不清楚。

方法

我们向四个自然语言处理程序(ChatGPT 3.5和4、Bard以及Claude 2.0)查询有关腹股沟疝及其治疗的模拟患者问题的回复。根据李克特量表(1分差至5分优)对回复的相关性、完整性和准确性进行评分。使用弗莱什-金凯德分数对回复进行汇总并集体评分以评估可读性,使用DISCERN工具对教育质量进行评分,DISCERN工具是一种经过验证的评估患者信息材料的工具。回复还与美国胃肠内镜外科医师学会(SAGES)和美国外科医师学会(ACS)提供的两份黄金标准教育材料进行了比较。评估由六位疝外科医生进行。

结果

自然语言处理程序回复的相关性、完整性和准确性的平均得分分别为4.76(95%置信区间4.70 - 4.80)、4.11(95%置信区间4.02 - 4.20)和4.14(95%置信区间4.03 - 4.24)。ChatGPT4的准确性得分(平均4.43 [95%置信区间4.37 - 4.50])高于Bard(平均4.06 [95%置信区间3.88 - 4.26])和Claude 2.0(平均3.85 [95%置信区间3.63 - 4.08])。ACS的文档在易读性(55.2)和年级水平(9.2)方面得分最高;然而,没有一份文档达到美国医学协会推荐的可读性阈值。ACS的文档DISCERN得分也最高,为63.5(57.0 - 70.1),与ChatGPT 4(50.8 [95%置信区间46.2 - 55.4])和Claude 2.0(48 [95%置信区间41.6 - 54.4])相比,显著更高。

结论

所评估的自然语言处理程序对腹股沟疝问题提供了相关性合理且准确性尚可的回复。汇总的自然语言处理程序回复的可读性和DISCERN得分相对较低,不过随着自然语言处理程序的发展或问题措辞的调整,结果可能会有所改善。随着手术患者更多地使用自然语言处理程序获取医疗保健信息,外科医生应了解自然语言处理程序作为患者教育工具的益处和局限性。

相似文献

1
Accuracy of natural language processors for patients seeking inguinal hernia information.自然语言处理器对寻求腹股沟疝信息患者的准确性。
Surg Endosc. 2024 Dec;38(12):7409-7415. doi: 10.1007/s00464-024-11221-y. Epub 2024 Oct 23.
2
Dr. Google to Dr. ChatGPT: assessing the content and quality of artificial intelligence-generated medical information on appendicitis.谷歌博士对 ChatGPT 博士:评估人工智能生成的关于阑尾炎的医学信息的内容和质量。
Surg Endosc. 2024 May;38(5):2887-2893. doi: 10.1007/s00464-024-10739-5. Epub 2024 Mar 5.
3
Reliability and readability analysis of ChatGPT-4 and Google Bard as a patient information source for the most commonly applied radionuclide treatments in cancer patients.ChatGPT-4 和 Google Bard 作为癌症患者最常用放射性核素治疗的患者信息来源的可靠性和可读性分析。
Rev Esp Med Nucl Imagen Mol (Engl Ed). 2024 Jul-Aug;43(4):500021. doi: 10.1016/j.remnie.2024.500021. Epub 2024 May 29.
4
Generative artificial intelligence chatbots may provide appropriate informational responses to common vascular surgery questions by patients.生成式人工智能聊天机器人可能会为患者关于常见血管外科问题提供恰当的信息性回复。
Vascular. 2025 Feb;33(1):229-237. doi: 10.1177/17085381241240550. Epub 2024 Mar 18.
5
Is Information About Musculoskeletal Malignancies From Large Language Models or Web Resources at a Suitable Reading Level for Patients?来自大语言模型或网络资源的关于肌肉骨骼恶性肿瘤的信息对患者来说是否处于合适的阅读水平?
Clin Orthop Relat Res. 2025 Feb 1;483(2):306-315. doi: 10.1097/CORR.0000000000003263. Epub 2024 Sep 25.
6
Accuracy and Readability of Artificial Intelligence Chatbot Responses to Vasectomy-Related Questions: Public Beware.人工智能聊天机器人对输精管切除术相关问题回答的准确性和可读性:公众需谨慎。
Cureus. 2024 Aug 28;16(8):e67996. doi: 10.7759/cureus.67996. eCollection 2024 Aug.
7
The impact of internet resources and artificial intelligence on information on myringotomy tubes.互联网资源和人工智能对鼓膜切开术管相关信息的影响
Eur Arch Otorhinolaryngol. 2025 Apr;282(4):2149-2153. doi: 10.1007/s00405-024-09148-0. Epub 2024 Dec 12.
8
Appropriateness and readability of Google Bard and ChatGPT-3.5 generated responses for surgical treatment of glaucoma.谷歌巴德和 ChatGPT-3.5 生成的青光眼手术治疗回复的适宜性和可读性。
Rom J Ophthalmol. 2024 Jul-Sep;68(3):243-248. doi: 10.22336/rjo.2024.45.
9
Evaluating the Efficacy of ChatGPT as a Patient Education Tool in Prostate Cancer: Multimetric Assessment.评估 ChatGPT 在前列腺癌患者教育中的疗效:多指标评估。
J Med Internet Res. 2024 Aug 14;26:e55939. doi: 10.2196/55939.
10
American academy of Orthopedic Surgeons' OrthoInfo provides more readable information regarding meniscus injury than ChatGPT-4 while information accuracy is comparable.美国矫形外科医师学会的OrthoInfo在半月板损伤方面提供了比ChatGPT-4更具可读性的信息,而信息准确性相当。
J ISAKOS. 2025 Apr;11:100843. doi: 10.1016/j.jisako.2025.100843. Epub 2025 Feb 21.

本文引用的文献

1
ChatGPT Can Offer Satisfactory Responses to Common Patient Questions Regarding Elbow Ulnar Collateral Ligament Reconstruction.ChatGPT能够对有关肘部尺侧副韧带重建的常见患者问题提供令人满意的回答。
Arthrosc Sports Med Rehabil. 2024 Feb 13;6(2):100893. doi: 10.1016/j.asmr.2024.100893. eCollection 2024 Apr.
2
Use of Artificial Intelligence Chatbots for Cancer Treatment Information.使用人工智能聊天机器人获取癌症治疗信息。
JAMA Oncol. 2023 Oct 1;9(10):1459-1462. doi: 10.1001/jamaoncol.2023.2954.
3
Assessing ChatGPT Responses to Common Patient Questions Regarding Total Hip Arthroplasty.
评估 ChatGPT 对全髋关节置换术常见患者问题的回答。
J Bone Joint Surg Am. 2023 Oct 4;105(19):1519-1526. doi: 10.2106/JBJS.23.00209. Epub 2023 Jul 17.
4
Comparing Physician and Artificial Intelligence Chatbot Responses to Patient Questions Posted to a Public Social Media Forum.比较医生和人工智能聊天机器人对发布在公共社交媒体论坛上的患者问题的回复。
JAMA Intern Med. 2023 Jun 1;183(6):589-596. doi: 10.1001/jamainternmed.2023.1838.
5
Artificial Hallucinations in ChatGPT: Implications in Scientific Writing.ChatGPT中的人工幻觉:对科学写作的影响
Cureus. 2023 Feb 19;15(2):e35179. doi: 10.7759/cureus.35179. eCollection 2023 Feb.
6
Social Media and Medical Misinformation: Confronting New Variants of an Old Problem.社交媒体与医学错误信息:应对一个老问题的新变体
JAMA. 2022 Oct 11;328(14):1393-1394. doi: 10.1001/jama.2022.17191.
7
AAPOR Reporting Guidelines for Survey Studies.美国民意研究协会(AAPOR)调查研究报告指南。
JAMA Surg. 2021 Aug 1;156(8):785-786. doi: 10.1001/jamasurg.2021.0543.
8
Prevalence of Health Misinformation on Social Media: Systematic Review.社交媒体健康类错误信息的流行情况:系统评价。
J Med Internet Res. 2021 Jan 20;23(1):e17187. doi: 10.2196/17187.
9
Online Health Information Seeking Among US Adults: Measuring Progress Toward a Healthy People 2020 Objective.美国成年人的在线健康信息搜索:衡量向 2020 年健康人目标迈进的进展。
Public Health Rep. 2019 Nov/Dec;134(6):617-625. doi: 10.1177/0033354919874074. Epub 2019 Sep 12.
10
Access to care and use of the Internet to search for health information: results from the US National Health Interview Survey.获得医疗服务及使用互联网搜索健康信息:美国国家健康访谈调查结果
J Med Internet Res. 2015 Apr 29;17(4):e106. doi: 10.2196/jmir.4126.