[人类与ChatGPT。在科学系统评价分析中有可能获得可比结果吗？]

INTRODUCTION

There is growing interest in the use of ChatGPT in the writing and reviewing of scientific articles. In line with the nature of ChatGPT, we tested its effectiveness in the scientific article review process.

METHODS

We compared the findings of a systematic review of the published literature, produced by researchers in the traditional way, with a version created by ChatGPT, obtained by providing the same inputs as the original paper and a set of instructions (prompts) optimized to obtain the same type of result; we also identified the process that led to a comparable result. In order to assess the effectiveness of ChatGPT in analyzing the systematic review, we selected an existing, replicable study on the experience of health care professionals in the use of digital tools in clinical practice, from which we extracted and downloaded the related 17 publications in Pdf format. Subsequently, we uploaded these references into ChatGPT, setting specific prompts detailing the professional profile required, the context of the application, the expected outputs, and the level of creative freedom (temperature) to a minimum to limit the possibility of "hallucinations". After verifying ChatGPT's understanding of the task, we performed several iterations of the prompt until we obtained a result comparable to the original review. Finally, we systematically compared the results obtained by ChatGPT with those of the reference review.

RESULTS

The analysis showed that ChatGPT's results are comparable to human results, although 4 iterations of the prompt are required to approach the human benchmark.

DISCUSSION

Although ChatGPT showed comparable capabilities in text review, human authors exhibited greater analytical depth in interpretation. Due to their greater creative freedom, the authors offered more details about the benefits of digital tools in the hospital setting. ChatGPT, however, enriched the analysis by including elements not contemplated initially. The final comparison revealed comparable macro-themes between the two approaches, emphasizing the need for careful human validation to ensure the full integrity and depth of the analysis.

CONCLUSIONS

Generative artificial intelligence (AI), represented by ChatGPT, showed significant potential in revolutionizing the production of scientific literature by supporting healthcare professionals. Although there are challenges that require careful evaluation, ChatGPT's results are comparable to human results. The key element is not so much the superiority of AI over humans but the human ability to configure and direct AI for optimal or even potentially superior human results.

引言

人们对ChatGPT在科学文章撰写和评审中的应用兴趣与日俱增。根据ChatGPT的特性，我们测试了它在科学文章评审过程中的有效性。

方法

我们将研究人员以传统方式进行的已发表文献系统综述结果，与通过向ChatGPT提供与原文相同的输入以及为获得相同类型结果而优化的一组指令（提示）所生成的版本进行了比较；我们还确定了得出可比结果的过程。为了评估ChatGPT在分析系统综述方面的有效性，我们选择了一项关于医疗保健专业人员在临床实践中使用数字工具经验的现有且可复制的研究，从中提取并下载了相关的17篇以PDF格式呈现的出版物。随后，我们将这些参考文献上传到ChatGPT中，设置了特定提示，详细说明了所需的专业背景、应用场景、预期输出，并将创造性自由度（温度）降至最低以限制“幻觉”出现的可能性。在确认ChatGPT对任务的理解后，我们对提示进行了多次迭代，直到获得与原始综述相当的结果。最后，我们系统地比较了ChatGPT获得的结果与参考文献综述的结果。

结果

分析表明，ChatGPT的结果与人工结果相当，不过需要对提示进行4次迭代才能接近人工基准。

讨论

尽管ChatGPT在文本评审中表现出了相当的能力，但人类作者在解释方面展现出了更深的分析深度。由于具有更大的创造性自由度，作者们提供了更多关于数字工具在医院环境中的益处的细节。然而，ChatGPT通过纳入最初未考虑的元素丰富了分析内容。最终比较显示两种方法之间存在可比的宏观主题，强调了需要进行仔细的人工验证以确保分析的完整性和深度。

结论

以ChatGPT为代表的生成式人工智能在支持医疗保健专业人员革新科学文献的产出方面显示出巨大潜力。尽管存在需要仔细评估的挑战，但ChatGPT的结果与人工结果相当。关键因素与其说是人工智能优于人类，不如说是人类配置和引导人工智能以获得最佳甚至可能更优人类结果的能力。

[Human vs. ChatGPT. Is it possible obtain comparable results in the analysis of a scientific systematic review?].

作者信息

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

DISCUSSION

CONCLUSIONS

引言

方法

结果

讨论

结论

相似文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献