• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Evaluation and Comparison of Ophthalmic Scientific Abstracts and References by Current Artificial Intelligence Chatbots.当前人工智能聊天机器人对眼科科学摘要和参考文献的评估与比较。
JAMA Ophthalmol. 2023 Sep 1;141(9):819-824. doi: 10.1001/jamaophthalmol.2023.3119.
2
Comparison of Medical Research Abstracts Written by Surgical Trainees and Senior Surgeons or Generated by Large Language Models.外科住院医师和资深外科医生撰写的医学研究摘要与大型语言模型生成的摘要的比较。
JAMA Netw Open. 2024 Aug 1;7(8):e2425373. doi: 10.1001/jamanetworkopen.2024.25373.
3
Reference Hallucination Score for Medical Artificial Intelligence Chatbots: Development and Usability Study.医学人工智能聊天机器人的参考幻觉评分:开发与可用性研究。
JMIR Med Inform. 2024 Jul 31;12:e54345. doi: 10.2196/54345.
4
Comparison of Ophthalmologist and Large Language Model Chatbot Responses to Online Patient Eye Care Questions.眼科医生与大型语言模型聊天机器人对在线患者眼部护理问题的回复比较。
JAMA Netw Open. 2023 Aug 1;6(8):e2330320. doi: 10.1001/jamanetworkopen.2023.30320.
5
Assessment of Artificial Intelligence Chatbot Responses to Top Searched Queries About Cancer.评估人工智能聊天机器人对癌症热门搜索查询的响应
JAMA Oncol. 2023 Oct 1;9(10):1437-1440. doi: 10.1001/jamaoncol.2023.2947.
6
Physician and Artificial Intelligence Chatbot Responses to Cancer Questions From Social Media.医生与人工智能聊天机器人对社交媒体上癌症问题的回复。
JAMA Oncol. 2024 Jul 1;10(7):956-960. doi: 10.1001/jamaoncol.2024.0836.
7
Artificial Intelligence Chatbot Behavior Change Model for Designing Artificial Intelligence Chatbots to Promote Physical Activity and a Healthy Diet: Viewpoint.人工智能聊天机器人行为改变模型设计人工智能聊天机器人促进身体活动和健康饮食:观点。
J Med Internet Res. 2020 Sep 30;22(9):e22845. doi: 10.2196/22845.
8
Use of Artificial Intelligence Chatbots in Interpretation of Pathology Reports.人工智能聊天机器人在病理报告解读中的应用。
JAMA Netw Open. 2024 May 1;7(5):e2412767. doi: 10.1001/jamanetworkopen.2024.12767.
9
Accuracy of an Artificial Intelligence Chatbot's Interpretation of Clinical Ophthalmic Images.人工智能聊天机器人对临床眼科图像的解读准确性。
JAMA Ophthalmol. 2024 Apr 1;142(4):321-326. doi: 10.1001/jamaophthalmol.2024.0017.
10
Exploring the Boundaries of Reality: Investigating the Phenomenon of Artificial Intelligence Hallucination in Scientific Writing Through ChatGPT References.探索现实的边界:通过ChatGPT参考文献研究科学写作中的人工智能幻觉现象。
Cureus. 2023 Apr 11;15(4):e37432. doi: 10.7759/cureus.37432. eCollection 2023 Apr.

引用本文的文献

1
Can we trust academic AI detective? Accuracy and limitations of AI-output detectors.我们能信任学术人工智能侦探吗?人工智能输出检测器的准确性和局限性。
Acta Neurochir (Wien). 2025 Aug 7;167(1):214. doi: 10.1007/s00701-025-06622-4.
2
A scoping review of natural language processing in addressing medically inaccurate information: Errors, misinformation, and hallucination.关于自然语言处理在处理医学错误信息方面的范围综述:错误、错误信息和幻觉。
J Biomed Inform. 2025 Jul 22:104866. doi: 10.1016/j.jbi.2025.104866.
3
A guide to evade hallucinations and maintain reliability when using large language models for medical research: a narrative review.使用大语言模型进行医学研究时避免幻觉并保持可靠性的指南:一项叙述性综述。
Ann Pediatr Endocrinol Metab. 2025 Jun;30(3):115-118. doi: 10.6065/apem.2448278.139. Epub 2025 Jun 30.
4
Large Language Models: Pioneering New Educational Frontiers in Childhood Myopia.大语言模型:开创儿童近视教育新前沿
Ophthalmol Ther. 2025 Jun;14(6):1281-1295. doi: 10.1007/s40123-025-01142-x. Epub 2025 Apr 21.
5
Large Language Models in Ophthalmology: A Review of Publications from Top Ophthalmology Journals.眼科领域的大语言模型:顶级眼科期刊出版物综述
Ophthalmol Sci. 2024 Dec 17;5(3):100681. doi: 10.1016/j.xops.2024.100681. eCollection 2025 May-Jun.
6
Embracing Generative Artificial Intelligence in Clinical Research and Beyond: Opportunities, Challenges, and Solutions.在临床研究及其他领域采用生成式人工智能:机遇、挑战与解决方案
JACC Adv. 2025 Mar;4(3):101593. doi: 10.1016/j.jacadv.2025.101593. Epub 2025 Feb 8.
7
Assessing the Efficacy of ChatGPT Prompting Strategies in Enhancing Thyroid Cancer Patient Education: A Prospective Study.评估ChatGPT提示策略在加强甲状腺癌患者教育中的效果:一项前瞻性研究。
J Med Syst. 2025 Jan 17;49(1):11. doi: 10.1007/s10916-024-02129-0.
8
Revolutionizing Health Care: The Transformative Impact of Large Language Models in Medicine.变革医疗保健:大语言模型在医学领域的变革性影响。
J Med Internet Res. 2025 Jan 7;27:e59069. doi: 10.2196/59069.
9
Analyzing the Effectiveness of AI-Generated Patient Education Materials: A Comparative Study of ChatGPT and Google Gemini.分析人工智能生成的患者教育材料的有效性:ChatGPT与谷歌Gemini的比较研究
Cureus. 2024 Nov 25;16(11):e74398. doi: 10.7759/cureus.74398. eCollection 2024 Nov.
10
Analysis of ChatGPT Responses to Ophthalmic Cases: Can ChatGPT Think like an Ophthalmologist?ChatGPT对眼科病例的回答分析:ChatGPT能像眼科医生一样思考吗?
Ophthalmol Sci. 2024 Aug 23;5(1):100600. doi: 10.1016/j.xops.2024.100600. eCollection 2025 Jan-Feb.

当前人工智能聊天机器人对眼科科学摘要和参考文献的评估与比较。

Evaluation and Comparison of Ophthalmic Scientific Abstracts and References by Current Artificial Intelligence Chatbots.

机构信息

Cole Eye Institute, Cleveland Clinic Foundation, Cleveland, Ohio.

出版信息

JAMA Ophthalmol. 2023 Sep 1;141(9):819-824. doi: 10.1001/jamaophthalmol.2023.3119.

DOI:10.1001/jamaophthalmol.2023.3119
PMID:37498609
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10375387/
Abstract

IMPORTANCE

Language-learning model-based artificial intelligence (AI) chatbots are growing in popularity and have significant implications for both patient education and academia. Drawbacks of using AI chatbots in generating scientific abstracts and reference lists, including inaccurate content coming from hallucinations (ie, AI-generated output that deviates from its training data), have not been fully explored.

OBJECTIVE

To evaluate and compare the quality of ophthalmic scientific abstracts and references generated by earlier and updated versions of a popular AI chatbot.

DESIGN, SETTING, AND PARTICIPANTS: This cross-sectional comparative study used 2 versions of an AI chatbot to generate scientific abstracts and 10 references for clinical research questions across 7 ophthalmology subspecialties. The abstracts were graded by 2 authors using modified DISCERN criteria and performance evaluation scores.

MAIN OUTCOME AND MEASURES

Scores for the chatbot-generated abstracts were compared using the t test. Abstracts were also evaluated by 2 AI output detectors. A hallucination rate for unverifiable references generated by the earlier and updated versions of the chatbot was calculated and compared.

RESULTS

The mean modified AI-DISCERN scores for the chatbot-generated abstracts were 35.9 and 38.1 (maximum of 50) for the earlier and updated versions, respectively (P = .30). Using the 2 AI output detectors, the mean fake scores (with a score of 100% meaning generated by AI) for the earlier and updated chatbot-generated abstracts were 65.4% and 10.8%, respectively (P = .01), for one detector and were 69.5% and 42.7% (P = .17) for the second detector. The mean hallucination rates for nonverifiable references generated by the earlier and updated versions were 33% and 29% (P = .74).

CONCLUSIONS AND RELEVANCE

Both versions of the chatbot generated average-quality abstracts. There was a high hallucination rate of generating fake references, and caution should be used when using these AI resources for health education or academic purposes.

摘要

重要性

基于语言学习模型的人工智能(AI)聊天机器人越来越受欢迎,对患者教育和学术界都有重大影响。在生成科学摘要和参考列表方面使用 AI 聊天机器人的缺点,包括来自幻觉的不准确内容(即,与训练数据偏离的 AI 生成输出),尚未得到充分探讨。

目的

评估和比较流行的 AI 聊天机器人的早期和更新版本生成的眼科科学摘要和参考文献的质量。

设计、设置和参与者:这项横断面比较研究使用了 2 个版本的 AI 聊天机器人,为 7 个眼科亚专业的临床研究问题生成科学摘要和 10 个参考文献。摘要由 2 位作者使用修改后的 DISCERN 标准和绩效评估评分进行评分。

主要结果和措施

使用 t 检验比较聊天机器人生成的摘要的分数。还使用 2 个 AI 输出检测器评估摘要。计算并比较了聊天机器人早期和更新版本生成的不可验证参考文献的幻觉率。

结果

聊天机器人生成的摘要的平均修正 AI-DISCERN 分数分别为早期版本的 35.9 和更新版本的 38.1(最高 50 分)(P =.30)。使用 2 个 AI 输出检测器,早期和更新的聊天机器人生成的摘要的平均假分数(分数为 100%表示由 AI 生成)分别为 65.4%和 10.8%(P =.01),对于一个检测器和分别为 69.5%和 42.7%(P =.17)对于第二个检测器。早期和更新版本生成的不可验证参考文献的平均幻觉率分别为 33%和 29%(P =.74)。

结论和相关性

聊天机器人的两个版本都生成了平均质量的摘要。生成虚假参考文献的幻觉率很高,在将这些 AI 资源用于健康教育或学术目的时应谨慎使用。