当前人工智能聊天机器人对眼科科学摘要和参考文献的评估与比较。

Evaluation and Comparison of Ophthalmic Scientific Abstracts and References by Current Artificial Intelligence Chatbots.

机构信息

Cole Eye Institute, Cleveland Clinic Foundation, Cleveland, Ohio.

出版信息

JAMA Ophthalmol. 2023 Sep 1;141(9):819-824. doi: 10.1001/jamaophthalmol.2023.3119.

DOI:10.1001/jamaophthalmol.2023.3119

PMID:37498609

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10375387/

Abstract

IMPORTANCE

Language-learning model-based artificial intelligence (AI) chatbots are growing in popularity and have significant implications for both patient education and academia. Drawbacks of using AI chatbots in generating scientific abstracts and reference lists, including inaccurate content coming from hallucinations (ie, AI-generated output that deviates from its training data), have not been fully explored.

OBJECTIVE

To evaluate and compare the quality of ophthalmic scientific abstracts and references generated by earlier and updated versions of a popular AI chatbot.

DESIGN, SETTING, AND PARTICIPANTS: This cross-sectional comparative study used 2 versions of an AI chatbot to generate scientific abstracts and 10 references for clinical research questions across 7 ophthalmology subspecialties. The abstracts were graded by 2 authors using modified DISCERN criteria and performance evaluation scores.

MAIN OUTCOME AND MEASURES

Scores for the chatbot-generated abstracts were compared using the t test. Abstracts were also evaluated by 2 AI output detectors. A hallucination rate for unverifiable references generated by the earlier and updated versions of the chatbot was calculated and compared.

RESULTS

The mean modified AI-DISCERN scores for the chatbot-generated abstracts were 35.9 and 38.1 (maximum of 50) for the earlier and updated versions, respectively (P = .30). Using the 2 AI output detectors, the mean fake scores (with a score of 100% meaning generated by AI) for the earlier and updated chatbot-generated abstracts were 65.4% and 10.8%, respectively (P = .01), for one detector and were 69.5% and 42.7% (P = .17) for the second detector. The mean hallucination rates for nonverifiable references generated by the earlier and updated versions were 33% and 29% (P = .74).

CONCLUSIONS AND RELEVANCE

Both versions of the chatbot generated average-quality abstracts. There was a high hallucination rate of generating fake references, and caution should be used when using these AI resources for health education or academic purposes.

摘要

重要性

基于语言学习模型的人工智能（AI）聊天机器人越来越受欢迎，对患者教育和学术界都有重大影响。在生成科学摘要和参考列表方面使用 AI 聊天机器人的缺点，包括来自幻觉的不准确内容（即，与训练数据偏离的 AI 生成输出），尚未得到充分探讨。

目的

评估和比较流行的 AI 聊天机器人的早期和更新版本生成的眼科科学摘要和参考文献的质量。

设计、设置和参与者：这项横断面比较研究使用了 2 个版本的 AI 聊天机器人，为 7 个眼科亚专业的临床研究问题生成科学摘要和 10 个参考文献。摘要由 2 位作者使用修改后的 DISCERN 标准和绩效评估评分进行评分。

主要结果和措施

使用 t 检验比较聊天机器人生成的摘要的分数。还使用 2 个 AI 输出检测器评估摘要。计算并比较了聊天机器人早期和更新版本生成的不可验证参考文献的幻觉率。

结果

聊天机器人生成的摘要的平均修正 AI-DISCERN 分数分别为早期版本的 35.9 和更新版本的 38.1（最高 50 分）（P =.30）。使用 2 个 AI 输出检测器，早期和更新的聊天机器人生成的摘要的平均假分数（分数为 100%表示由 AI 生成）分别为 65.4%和 10.8%（P =.01），对于一个检测器和分别为 69.5%和 42.7%（P =.17）对于第二个检测器。早期和更新版本生成的不可验证参考文献的平均幻觉率分别为 33%和 29%（P =.74）。

结论和相关性

聊天机器人的两个版本都生成了平均质量的摘要。生成虚假参考文献的幻觉率很高，在将这些 AI 资源用于健康教育或学术目的时应谨慎使用。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

当前人工智能聊天机器人对眼科科学摘要和参考文献的评估与比较。

Evaluation and Comparison of Ophthalmic Scientific Abstracts and References by Current Artificial Intelligence Chatbots.

机构信息

出版信息

IMPORTANCE

OBJECTIVE

MAIN OUTCOME AND MEASURES

RESULTS

CONCLUSIONS AND RELEVANCE

重要性

目的

主要结果和措施

结果

结论和相关性

相似文献

引用本文的文献

相似文献

引用本文的文献

当前人工智能聊天机器人对眼科科学摘要和参考文献的评估与比较。

Evaluation and Comparison of Ophthalmic Scientific Abstracts and References by Current Artificial Intelligence Chatbots.

机构信息

出版信息

IMPORTANCE

OBJECTIVE

MAIN OUTCOME AND MEASURES

RESULTS

CONCLUSIONS AND RELEVANCE

重要性

目的

主要结果和措施

结果

结论和相关性