数字墨水与手术梦想：对人工智能生成论文在住院医师申请中的看法。

Digital Ink and Surgical Dreams: Perceptions of Artificial Intelligence-Generated Essays in Residency Applications.

机构信息

Department of Biomedical Engineering, University of Rochester, Rochester, New York.

School of Medicine and Dentistry, University of Rochester, Rochester, New York.

出版信息

J Surg Res. 2024 Sep;301:504-511. doi: 10.1016/j.jss.2024.06.020. Epub 2024 Jul 22.

DOI:10.1016/j.jss.2024.06.020

PMID:39042979

Abstract

INTRODUCTION

Large language models like Chat Generative Pre-Trained Transformer (ChatGPT) are increasingly used in academic writing. Faculty may consider use of artificial intelligence (AI)-generated responses a form of cheating. We sought to determine whether general surgery residency faculty could detect AI versus human-written responses to a text prompt; hypothesizing that faculty would not be able to reliably differentiate AI versus human-written responses.

METHODS

Ten essays were generated using a text prompt, "Tell us in 1-2 paragraphs why you are considering the University of Rochester for General Surgery residency" (Current trainees: n = 5, ChatGPT: n = 5). Ten blinded faculty reviewers rated essays (ten-point Likert scale) on the following criteria: desire to interview, relevance to the general surgery residency, overall impression, and AI- or human-generated; with scores and identification error rates compared between the groups.

RESULTS

There were no differences between groups for %total points (ChatGPT 66.0 ± 13.5%, human 70.0 ± 23.0%, P = 0.508) or identification error rates (ChatGPT 40.0 ± 35.0%, human 20.0 ± 30.0%, P = 0.175). Except for one, all essays were identified incorrectly by at least two reviewers. Essays identified as human-generated received higher overall impression scores (area under the curve: 0.82 ± 0.04, P < 0.01).

CONCLUSIONS

Whether use of AI tools for academic purposes should constitute academic dishonesty is controversial. We demonstrate that human and AI-generated essays are similar in quality, but there is bias against presumed AI-generated essays. Faculty are not able to reliably differentiate human from AI-generated essays, thus bias may be misdirected. AI-tools are becoming ubiquitous and their use is not easily detected. Faculty must expect these tools to play increasing roles in medical education.

摘要

引言

像 Chat Generative Pre-Trained Transformer（ChatGPT）这样的大型语言模型越来越多地用于学术写作。教师可能会将人工智能（AI）生成的回复视为一种作弊形式。我们旨在确定普通外科住院医师教师是否能够检测到文本提示的 AI 与人工书写的回复；假设教师无法可靠地区分 AI 与人工书写的回复。

方法

使用文本提示“用 1-2 段话告诉我们为什么您考虑在罗切斯特大学攻读普通外科住院医师”生成了 10 篇文章（现任学员：n=5，ChatGPT：n=5）。十名盲审教师根据以下标准对文章进行评分（十分制李克特量表）：面试意愿、与普通外科住院医师相关度、整体印象和 AI 或人工生成；并比较两组的分数和识别错误率。

结果

在总分百分比（ChatGPT 66.0±13.5%，人类 70.0±23.0%，P=0.508）或识别错误率（ChatGPT 40.0±35.0%，人类 20.0±30.0%，P=0.175）方面，两组之间没有差异。除了一篇文章外，所有文章都至少被两名审稿人错误识别。被识别为人工生成的文章获得了更高的整体印象评分（曲线下面积：0.82±0.04，P<0.01）。

结论

是否应将 AI 工具用于学术目的视为学术不诚实是有争议的。我们表明，AI 生成的论文和人工撰写的论文在质量上相似，但对假定的 AI 生成的论文存在偏见。教师无法可靠地区分人工和 AI 生成的论文，因此这种偏见可能是错误的。AI 工具变得无处不在，其使用也不容易被发现。教师必须期望这些工具在医学教育中发挥越来越重要的作用。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

数字墨水与手术梦想：对人工智能生成论文在住院医师申请中的看法。

Digital Ink and Surgical Dreams: Perceptions of Artificial Intelligence-Generated Essays in Residency Applications.

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

CONCLUSIONS

引言

方法

结果

结论

相似文献

引用本文的文献

数字墨水与手术梦想：对人工智能生成论文在住院医师申请中的看法。

Digital Ink and Surgical Dreams: Perceptions of Artificial Intelligence-Generated Essays in Residency Applications.

机构信息

出版信息

INTRODUCTION

METHODS

RESULTS

CONCLUSIONS

引言

方法

结果

结论

相似文献

引用本文的文献