Alencar-Palha Caio, Ocampo Thais, Silva Thaisa Pinheiro, Neves Frederico Sampaio, Oliveira Matheus L
Division of Oral Radiology, Department of Oral Diagnosis, Piracicaba Dental School, University of Campinas, Piracicaba, São Paulo, Brazil.
Division of Oral Radiology, Department of Propedeutics and Integrated Clinic, School of Dentistry, Federal University of Bahia, Salvador, Bahia, Brazil.
Eur J Dent Educ. 2025 Feb;29(1):149-154. doi: 10.1111/eje.13057. Epub 2024 Nov 19.
To evaluate the performance of a Generative Pre-trained Transformer (GPT) in generating scientific abstracts in dentistry.
Ten scientific articles in dental radiology had their original abstracts collected, while another 10 articles had their methodology and results added to a ChatGPT prompt to generate an abstract. All abstracts were randomised and compiled into a single file for subsequent assessment. Five evaluators classified whether the abstract was generated by a human using a 5-point scale and provided justifications within seven aspects: formatting, information accuracy, orthography, punctuation, terminology, text fluency, and writing style. Furthermore, an online GPT detector provided "Human Score" values, and a plagiarism detector assessed similarity with existing literature.
Sensitivity values for detecting human writing ranged from 0.20 to 0.70, with a mean of 0.58; specificity values ranged from 0.40 to 0.90, with a mean of 0.62; and accuracy values ranged from 0.50 to 0.80, with a mean of 0.60. Orthography and Punctuation were the most indicated aspects for the abstract generated by ChatGPT. The GPT detector revealed confidence levels for a "Human Score" of 16.9% for the AI-generated texts and plagiarism levels averaging 35%.
The GPT exhibited commendable performance in generating scientific abstracts when evaluated by humans, as the generated abstracts were indistinguishable from those generated by humans. When evaluated by an online GPT detector, the use of GPT became apparent.
评估生成式预训练变换器(GPT)在生成牙科科学摘要方面的性能。
收集了10篇牙科放射学领域的科学文章的原始摘要,另外10篇文章则将其方法和结果添加到ChatGPT提示中以生成摘要。所有摘要均被随机化并汇编成一个文件以供后续评估。五名评估人员使用5分制对摘要是否由人工生成进行分类,并在七个方面提供理由:格式、信息准确性、拼写、标点、术语、文本流畅性和写作风格。此外,一个在线GPT检测器提供了“人类分数”值,一个抄袭检测器评估了与现有文献的相似度。
检测人工写作的灵敏度值范围为0.20至0.70,平均为0.58;特异性值范围为0.40至0.90,平均为0.62;准确性值范围为0.50至0.80,平均为0.60。拼写和标点是ChatGPT生成的摘要中最受关注的方面。GPT检测器显示,人工智能生成的文本的“人类分数”置信水平为16.9%,抄袭水平平均为35%。
当由人类评估时,GPT在生成科学摘要方面表现出值得称赞的性能,因为生成的摘要与人工生成的摘要难以区分。当由在线GPT检测器评估时,GPT的使用就变得明显了。