• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

评估大语言模型聊天机器人生成通俗易懂摘要的能力。

Assessing the Capability of Large Language Model Chatbots in Generating Plain Language Summaries.

作者信息

Mondal Himel, Gupta Gaurav, Sarangi Pradosh Kumar, Sharma Shreya, Choudhary Pritam K, Juhi Ayesha, Kumari Anita, Mondal Shaikat

机构信息

Physiology, All India Institute of Medical Sciences, Deoghar, IND.

Pediatrics, All India Institute of Medical Sciences, Guwahati, IND.

出版信息

Cureus. 2025 Mar 21;17(3):e80976. doi: 10.7759/cureus.80976. eCollection 2025 Mar.

DOI:10.7759/cureus.80976
PMID:40260353
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12010112/
Abstract

Background Plain language summaries (PLSs) make scientific research accessible to a broad non-expert audience. However, crafting effective PLS can be challenging, particularly for non-native English-speaking researchers. Large language model (LLM) chatbots have the potential to assist in generating summaries, but their effectiveness compared to human-generated PLS remains underexplored. Methods This cross-sectional study compared 30 human-written PLS with LLM chatbot (viz., ChatGPT (OpenAI, San Francisco, CA), Claude (Anthropic, San Francisco, CA), Copilot (Microsoft Corp., Washington, DC), Gemini (Google, Mountain View, CA), Meta AI (Meta, Menlo Park, CA), and Perplexity (Perplexity AI, Inc., San Francisco, CA)) generated PLS. The readability of the PLS was checked by the Flesch reading (FR) ease score, and understandability was checked by the Flesch-Kincaid (FK) grade level. Three authors rated the text on seven-item predefined criteria, and their average score was used to compare the quality of the PLS. Results In comparison to human-written PLS, chatbots could generate PLS with lower FK grade levels (p-value < 0.0001) and except Copilot, all others had higher FR ease scores. The overall score of human-written PLS was 8.89±0.26. Although there was statistically significant variance among the scores (F = 7.16, p-value = 0.0012), in the post-hoc test, there was no difference between human-generated and individual chatbots-generated PLS (ChatGPT 8.8±0.34, Claude 8.89±0.33, Copilot 8.69±0.4, Gemini 8.56±0.56, Meta AI 8.98±0.23, and Perplexity 8.8±0.3). Conclusion LLM chatbots can generate PLS with better readability and a person with a lower grade of education can understand it. The PLS are of similar quality to those written by human authors. Hence, authors can generate PLS from LLM chatbots and it is particularly beneficial for researchers in developing countries. While LLM chatbots improve readability, they may introduce minor inaccuracies also. Hence, PLS generated by LLM should always checked for accuracy and relevancy.

摘要

背景 简明语言摘要(PLS)使广大非专业受众能够理解科学研究。然而,撰写有效的PLS可能具有挑战性,尤其是对于非英语母语的研究人员而言。大语言模型(LLM)聊天机器人有潜力协助生成摘要,但与人工撰写的PLS相比,其有效性仍未得到充分探索。方法 这项横断面研究将30篇人工撰写的PLS与LLM聊天机器人(即ChatGPT(OpenAI,加利福尼亚州旧金山)、Claude(Anthropic,加利福尼亚州旧金山)、Copilot(微软公司,华盛顿特区)、Gemini(谷歌,加利福尼亚州山景城)、Meta AI(Meta,加利福尼亚州门洛帕克)和Perplexity(Perplexity AI公司,加利福尼亚州旧金山))生成的PLS进行了比较。通过弗莱什阅读(FR)易读性分数检查PLS的可读性,并通过弗莱什-金凯德(FK)年级水平检查可理解性。三位作者根据七项预定义标准对文本进行评分,并使用他们的平均分数来比较PLS的质量。结果 与人工撰写的PLS相比,聊天机器人可以生成FK年级水平较低的PLS(p值<0.0001),除Copilot外,其他所有聊天机器人的FR易读性分数都更高。人工撰写的PLS的总体分数为8.89±0.26。尽管分数之间存在统计学上的显著差异(F = 7.16,p值 = 0.0012),但在事后检验中,人工生成的PLS与单个聊天机器人生成的PLS之间没有差异(ChatGPT 8.8±0.34、Claude 8.89±0.33、Copilot 8.69±0.4、Gemini 8.56±0.56、Meta AI 8.98±0.23和Perplexity 8.8±0.3)。结论 LLM聊天机器人可以生成可读性更好且受教育程度较低的人也能理解的PLS。这些PLS的质量与人工作者撰写的PLS相似。因此,作者可以从LLM聊天机器人生成PLS,这对发展中国家的研究人员特别有益。虽然LLM聊天机器人提高了可读性,但它们也可能引入一些小的不准确之处。因此,应由LLM生成的PLS始终要检查其准确性和相关性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/21a4/12010112/93c4d9cb614d/cureus-0017-00000080976-i02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/21a4/12010112/763f19eedbe9/cureus-0017-00000080976-i01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/21a4/12010112/93c4d9cb614d/cureus-0017-00000080976-i02.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/21a4/12010112/763f19eedbe9/cureus-0017-00000080976-i01.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/21a4/12010112/93c4d9cb614d/cureus-0017-00000080976-i02.jpg

相似文献

1
Assessing the Capability of Large Language Model Chatbots in Generating Plain Language Summaries.评估大语言模型聊天机器人生成通俗易懂摘要的能力。
Cureus. 2025 Mar 21;17(3):e80976. doi: 10.7759/cureus.80976. eCollection 2025 Mar.
2
Accuracy and Readability of Artificial Intelligence Chatbot Responses to Vasectomy-Related Questions: Public Beware.人工智能聊天机器人对输精管切除术相关问题回答的准确性和可读性:公众需谨慎。
Cureus. 2024 Aug 28;16(8):e67996. doi: 10.7759/cureus.67996. eCollection 2024 Aug.
3
Evaluating Accuracy and Readability of Responses to Midlife Health Questions: A Comparative Analysis of Six Large Language Model Chatbots.评估中年健康问题回答的准确性和可读性:六个大语言模型聊天机器人的比较分析
J Midlife Health. 2025 Jan-Mar;16(1):45-50. doi: 10.4103/jmh.jmh_182_24. Epub 2025 Apr 5.
4
Assessing the quality and readability of patient education materials on chemotherapy cardiotoxicity from artificial intelligence chatbots: An observational cross-sectional study.评估人工智能聊天机器人提供的关于化疗心脏毒性的患者教育材料的质量和可读性:一项观察性横断面研究。
Medicine (Baltimore). 2025 Apr 11;104(15):e42135. doi: 10.1097/MD.0000000000042135.
5
Assessing the Readability of Patient Education Materials on Cardiac Catheterization From Artificial Intelligence Chatbots: An Observational Cross-Sectional Study.评估人工智能聊天机器人提供的心脏导管插入术患者教育材料的可读性:一项观察性横断面研究。
Cureus. 2024 Jul 4;16(7):e63865. doi: 10.7759/cureus.63865. eCollection 2024 Jul.
6
Assessing the Quality of Patient Education Materials on Cardiac Catheterization From Artificial Intelligence Chatbots: An Observational Cross-Sectional Study.评估人工智能聊天机器人提供的心脏导管插入术患者教育材料的质量:一项观察性横断面研究。
Cureus. 2024 Sep 23;16(9):e69996. doi: 10.7759/cureus.69996. eCollection 2024 Sep.
7
Assessment of readability, reliability, and quality of ChatGPT®, BARD®, Gemini®, Copilot®, Perplexity® responses on palliative care.评估 ChatGPT®、BARD®、 Gemini®、Copilot®、Perplexity® 在姑息治疗方面的可读性、可靠性和质量。
Medicine (Baltimore). 2024 Aug 16;103(33):e39305. doi: 10.1097/MD.0000000000039305.
8
Evaluating the Quality and Readability of Generative Artificial Intelligence (AI) Chatbot Responses in the Management of Achilles Tendon Rupture.评估生成式人工智能(AI)聊天机器人在跟腱断裂管理中的回复质量和可读性。
Cureus. 2025 Jan 31;17(1):e78313. doi: 10.7759/cureus.78313. eCollection 2025 Jan.
9
Accuracy of Prospective Assessments of 4 Large Language Model Chatbot Responses to Patient Questions About Emergency Care: Experimental Comparative Study.前瞻性评估 4 种大型语言模型聊天机器人对患者关于急救护理问题的回答的准确性:实验性对比研究。
J Med Internet Res. 2024 Nov 4;26:e60291. doi: 10.2196/60291.
10
Readability, accuracy and appropriateness and quality of AI chatbot responses as a patient information source on root canal retreatment: A comparative assessment.作为根管再治疗患者信息来源的人工智能聊天机器人回复的可读性、准确性、恰当性和质量:一项比较评估。
Int J Med Inform. 2025 Sep;201:105948. doi: 10.1016/j.ijmedinf.2025.105948. Epub 2025 Apr 25.

引用本文的文献

1
Assessing Information Provided by ChatGPT: Heart Failure Versus Patent Ductus Arteriosus.评估ChatGPT提供的信息:心力衰竭与动脉导管未闭
Cureus. 2025 Jun 19;17(6):e86365. doi: 10.7759/cureus.86365. eCollection 2025 Jun.

本文引用的文献

1
The Worst-Case Scenario After AI Use in Academic Writing: A Clever User Wins?人工智能在学术写作中使用后的最坏情况:聪明的用户会赢吗?
Aust N Z J Obstet Gynaecol. 2024 Dec 19. doi: 10.1111/ajo.13928.
2
Responsible Use of Generative Artificial Intelligence for Research and Writing: Summarizing ICMJE Guideline.生成式人工智能在研究与写作中的合理使用:ICMJE指南总结
Indian J Orthop. 2024 Aug 28;58(10):1504-1505. doi: 10.1007/s43465-024-01258-5. eCollection 2024 Oct.
3
Practices and Barriers in Developing and Disseminating Plain-Language Resources Reporting Medical Research Information: A Scoping Review.
开发和传播医学研究信息的简明语言资源报告的实践和障碍:范围综述。
Patient. 2024 Sep;17(5):493-518. doi: 10.1007/s40271-024-00700-y. Epub 2024 Jun 15.
4
Adapted large language models can outperform medical experts in clinical text summarization.经过改编的大型语言模型在临床文本总结方面的表现优于医学专家。
Nat Med. 2024 Apr;30(4):1134-1142. doi: 10.1038/s41591-024-02855-5. Epub 2024 Feb 27.
5
Assessment of Quality and Readability of Information Provided by ChatGPT in Relation to Anterior Cruciate Ligament Injury.ChatGPT提供的关于前交叉韧带损伤信息的质量和可读性评估
J Pers Med. 2024 Jan 18;14(1):104. doi: 10.3390/jpm14010104.
6
Can Artificial Intelligence Improve the Readability of Patient Education Materials on Aortic Stenosis? A Pilot Study.人工智能能否提高主动脉瓣狭窄患者教育材料的可读性?一项试点研究。
Cardiol Ther. 2024 Mar;13(1):137-147. doi: 10.1007/s40119-023-00347-0. Epub 2024 Jan 9.
7
ChatGPT in academic writing: Maximizing its benefits and minimizing the risks.ChatGPT 在学术写作中的应用:最大化其益处,最小化其风险。
Indian J Ophthalmol. 2023 Dec 1;71(12):3600-3606. doi: 10.4103/IJO.IJO_718_23. Epub 2023 Nov 20.
8
ChatGPT Surpasses 1000 Publications on PubMed: Envisioning the Road Ahead.ChatGPT在PubMed上的出版物超过1000篇:展望未来之路。
Cureus. 2023 Sep 6;15(9):e44769. doi: 10.7759/cureus.44769. eCollection 2023 Sep.
9
Artificial Intelligence is Irreversibly Bound to Academic Publishing - ChatGPT is Cleared for Scientific Writing and Peer Review.人工智能与学术出版紧密相连,不可逆转——ChatGPT已被批准用于科学写作和同行评审。
Braz J Cardiovasc Surg. 2023 Oct 5;38(4):e20230963. doi: 10.21470/1678-9741-2023-0963.
10
Clinical Research With Large Language Models Generated Writing-Clinical Research with AI-assisted Writing (CRAW) Study.基于大语言模型生成文本的临床研究——人工智能辅助写作的临床研究(CRAW)
Crit Care Explor. 2023 Oct 2;5(10):e0975. doi: 10.1097/CCE.0000000000000975. eCollection 2023 Oct.