Katz Gabriel, Zloto Ofira, Hostovsky Avner, Huna-Baron Ruth, Ben-Bassat Mizrachi Iris, Burgansky Zvia, Skaat Alon, Vishnevskia-Dai Vicktoria, Fabian Ido Didi, Sagiv Oded, Priel Ayelet, Glicksberg Benjamin S, Klang Eyal
Faculty of Medical & Health Sciences, Tel Aviv University, Tel Aviv, Israel.
Goldschleger Eye Institute, Sheba Medical Center, Tel Hashomer, Israel.
Eye (Lond). 2025 Apr 1. doi: 10.1038/s41433-025-03779-1.
To examine the abilities of ChatGPT in writing scientific ophthalmology introductions and to compare those abilities to experienced ophthalmologists.
OpenAI web interface was utilized to interact with and prompt ChatGPT 4 for generating the introductions for the selected papers. Consequently, each paper had two introductions-one drafted by ChatGPT and the other by the original author. Ten ophthalmology specialists with a minimal experience of more than 15 years, each representing distinct subspecialties-retina, neuro-ophthalmology, oculoplastic, glaucoma, and ocular oncology were provided with the two sets of introductions without revealing the origin (ChatGPT or human author) and were tasked to evaluate the introductions.
For each type of introduction, out of 45 instances, specialists correctly identified the source 26 times (57.7%) and erred 19 times (42.2%). The misclassification rates for introductions were 25% for experts evaluating introductions from their own subspecialty while to 44.4% for experts assessed introductions outside their subspecialty domain. In the comparative evaluation of introductions written by ChatGPT and human authors, no significant difference was identified across the assessed metrics (language, data arrangement, factual accuracy, originality, data Currency). The misclassification rate (the frequency at which reviewers incorrectly identified the authorship) was highest in Oculoplastic (66.7%) and lowest in Retina (11.1%).
ChatGPT represents a significant advancement in facilitating the creation of original scientific papers in ophthalmology. The introductions generated by ChatGPT showed no statistically significant difference compared to those written by experts in terms of language, data organization, factual accuracy, originality, and the currency of information. In addition, nearly half of them being indistinguishable from the originals. Future research endeavours should explore ChatGPT-4's utility in composing other sections of research papers and delve into the associated ethical considerations.
检验ChatGPT撰写眼科科学引言的能力,并将这些能力与经验丰富的眼科医生进行比较。
利用OpenAI网络界面与ChatGPT 4进行交互并提示其为所选论文生成引言。因此,每篇论文有两篇引言——一篇由ChatGPT撰写,另一篇由原作者撰写。向十位至少有15年经验的眼科专家提供了这两组引言,他们分别代表不同的亚专业——视网膜、神经眼科、眼整形、青光眼和眼肿瘤学,且未透露引言的来源(ChatGPT或人类作者),并要求他们对引言进行评估。
对于每种类型的引言,在45个实例中,专家们正确识别来源26次(57.7%),错误识别19次(42.2%)。专家评估来自其自身亚专业的引言时,错误分类率为25%,而评估来自其亚专业领域之外的引言时,错误分类率为44.4%。在对ChatGPT和人类作者撰写的引言进行比较评估时,在所评估的指标(语言、数据安排、事实准确性、原创性和数据时效性)中未发现显著差异。错误分类率(审稿人错误识别作者身份的频率)在眼整形领域最高(66.7%),在视网膜领域最低(11.1%)。
ChatGPT在促进眼科原创科学论文的创作方面代表了一项重大进步。ChatGPT生成的引言在语言、数据组织、事实准确性、原创性和信息时效性方面与专家撰写的引言相比,没有统计学上的显著差异。此外,其中近一半与原文难以区分。未来的研究应探索ChatGPT-4在撰写研究论文其他部分的效用,并深入探讨相关的伦理考量。