Nabata Kylie J, AlShehri Yasir, Mashat Abdullah, Wiseman Sam M
Department of Surgery, St. Paul's Hospital, 1081 Burrard St., Vancouver, BC, V6Z 1Y6, Canada.
University of British Columbia, Vancouver, BC, V6T 1Z4, Canada.
Updates Surg. 2025 Jan 24. doi: 10.1007/s13304-025-02106-3.
This study aims to analyze the accuracy of human reviewers in identifying scientific abstracts generated by ChatGPT compared to the original abstracts. Participants completed an online survey presenting two research abstracts: one generated by ChatGPT and one original abstract. They had to identify which abstract was generated by AI and provide feedback on their preference and perceptions of AI technology in academic writing. This observational cross-sectional study involved surgical trainees and faculty at the University of British Columbia. The survey was distributed to all surgeons and trainees affiliated with the University of British Columbia, which includes general surgery, orthopedic surgery, thoracic surgery, plastic surgery, cardiovascular surgery, vascular surgery, neurosurgery, urology, otolaryngology, pediatric surgery, and obstetrics and gynecology. A total of 41 participants completed the survey. 41 participants responded, comprising 10 (23.3%) surgeons. Eighteen (40.0%) participants correctly identified the original abstract. Twenty-six (63.4%) participants preferred the ChatGPT abstract (p = 0.0001). On multivariate analysis, preferring the original abstract was associated with correct identification of the original abstract [OR 7.46, 95% CI (1.78, 31.4), p = 0.006]. Results suggest that human reviewers cannot accurately distinguish between human and AI-generated abstracts, and overall, there was a trend toward a preference for AI-generated abstracts. The findings contributed to understanding the implications of AI in manuscript production, including its benefits and ethical considerations.
本研究旨在分析与原始摘要相比,人类评审者识别由ChatGPT生成的科学摘要的准确性。参与者完成了一项在线调查,该调查展示了两篇研究摘要:一篇由ChatGPT生成,另一篇是原始摘要。他们必须识别出哪篇摘要是由人工智能生成的,并就他们在学术写作中对人工智能技术的偏好和看法提供反馈。这项观察性横断面研究涉及英属哥伦比亚大学的外科住院医师和教员。该调查分发给了英属哥伦比亚大学附属的所有外科医生和住院医师,包括普通外科、整形外科、胸外科、整形外科、心血管外科、血管外科、神经外科、泌尿外科、耳鼻喉科、小儿外科以及妇产科。共有41名参与者完成了调查。41名参与者做出了回应,其中包括10名(23.3%)外科医生。18名(40.0%)参与者正确识别出了原始摘要。26名(63.4%)参与者更喜欢ChatGPT生成的摘要(p = 0.0001)。在多变量分析中,更喜欢原始摘要与正确识别原始摘要相关[比值比7.46,95%置信区间(1.78,31.4),p = 0.006]。结果表明,人类评审者无法准确区分人类撰写的摘要和人工智能生成的摘要,总体而言,存在对人工智能生成摘要的偏好趋势。这些发现有助于理解人工智能在稿件撰写中的影响,包括其益处和伦理考量。