Suppr超能文献

评估由人类与ChatGPT进行的文献综述:比较研究。

Evaluating Literature Reviews Conducted by Humans Versus ChatGPT: Comparative Study.

作者信息

Mostafapour Mehrnaz, Fortier Jacqueline H, Pacheco Karen, Murray Heather, Garber Gary

机构信息

Canadian Medical Protective Association, Ottawa, ON, Canada.

Department of Emergency Medicine, Queen's University, Kingston, ON, Canada.

出版信息

JMIR AI. 2024 Aug 19;3:e56537. doi: 10.2196/56537.

Abstract

BACKGROUND

With the rapid evolution of artificial intelligence (AI), particularly large language models (LLMs) such as ChatGPT-4 (OpenAI), there is an increasing interest in their potential to assist in scholarly tasks, including conducting literature reviews. However, the efficacy of AI-generated reviews compared with traditional human-led approaches remains underexplored.

OBJECTIVE

This study aims to compare the quality of literature reviews conducted by the ChatGPT-4 model with those conducted by human researchers, focusing on the relational dynamics between physicians and patients.

METHODS

We included 2 literature reviews in the study on the same topic, namely, exploring factors affecting relational dynamics between physicians and patients in medicolegal contexts. One review used GPT-4, last updated in September 2021, and the other was conducted by human researchers. The human review involved a comprehensive literature search using medical subject headings and keywords in Ovid MEDLINE, followed by a thematic analysis of the literature to synthesize information from selected articles. The AI-generated review used a new prompt engineering approach, using iterative and sequential prompts to generate results. Comparative analysis was based on qualitative measures such as accuracy, response time, consistency, breadth and depth of knowledge, contextual understanding, and transparency.

RESULTS

GPT-4 produced an extensive list of relational factors rapidly. The AI model demonstrated an impressive breadth of knowledge but exhibited limitations in in-depth and contextual understanding, occasionally producing irrelevant or incorrect information. In comparison, human researchers provided a more nuanced and contextually relevant review. The comparative analysis assessed the reviews based on criteria including accuracy, response time, consistency, breadth and depth of knowledge, contextual understanding, and transparency. While GPT-4 showed advantages in response time and breadth of knowledge, human-led reviews excelled in accuracy, depth of knowledge, and contextual understanding.

CONCLUSIONS

The study suggests that GPT-4, with structured prompt engineering, can be a valuable tool for conducting preliminary literature reviews by providing a broad overview of topics quickly. However, its limitations necessitate careful expert evaluation and refinement, making it an assistant rather than a substitute for human expertise in comprehensive literature reviews. Moreover, this research highlights the potential and limitations of using AI tools like GPT-4 in academic research, particularly in the fields of health services and medical research. It underscores the necessity of combining AI's rapid information retrieval capabilities with human expertise for more accurate and contextually rich scholarly outputs.

摘要

背景

随着人工智能(AI)的迅速发展,尤其是诸如ChatGPT-4(OpenAI)之类的大型语言模型(LLM),人们对其协助学术任务(包括进行文献综述)的潜力越来越感兴趣。然而,与传统的人工主导方法相比,人工智能生成的综述的有效性仍未得到充分探索。

目的

本研究旨在比较ChatGPT-4模型进行的文献综述与人类研究人员进行的文献综述的质量,重点关注医生与患者之间的关系动态。

方法

我们在研究中纳入了关于同一主题的2篇文献综述,即探索在法医学背景下影响医生与患者关系动态的因素。一篇综述使用了2021年9月最后更新的GPT-4,另一篇由人类研究人员进行。人工综述包括在Ovid MEDLINE中使用医学主题词和关键词进行全面的文献检索,然后对文献进行主题分析,以综合所选文章中的信息。人工智能生成的综述使用了一种新的提示工程方法,通过迭代和顺序提示来生成结果。比较分析基于准确性、响应时间、一致性、知识的广度和深度、上下文理解以及透明度等定性指标。

结果

GPT-4迅速生成了一份广泛的关系因素列表。该人工智能模型展示了令人印象深刻的知识广度,但在深入和上下文理解方面存在局限性,偶尔会产生不相关或不正确的信息。相比之下,人类研究人员提供了更细致入微且与上下文相关度更高的综述。比较分析根据准确性、响应时间、一致性、知识的广度和深度、上下文理解以及透明度等标准对综述进行评估。虽然GPT-4在响应时间和知识广度方面具有优势,但人工主导的综述在准确性、知识深度和上下文理解方面表现出色。

结论

该研究表明,通过结构化提示工程,GPT-4可以成为快速提供主题广泛概述以进行初步文献综述的有价值工具。然而,其局限性需要进行仔细的专家评估和完善,使其成为综合文献综述中人类专业知识的辅助工具而非替代品。此外,本研究突出了在学术研究中使用GPT-4等人工智能工具的潜力和局限性,特别是在卫生服务和医学研究领域。它强调了将人工智能快速的信息检索能力与人类专业知识相结合以获得更准确且上下文丰富的学术成果的必要性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/2929/11369534/a2f8b1f956de/ai_v3i1e56537_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验