生成式预训练变换器在牙科科学摘要生成中的性能：一项比较观察性研究。

Performance of a Generative Pre-Trained Transformer in Generating Scientific Abstracts in Dentistry: A Comparative Observational Study.

作者信息

Alencar-Palha Caio, Ocampo Thais, Silva Thaisa Pinheiro, Neves Frederico Sampaio, Oliveira Matheus L

机构信息

Division of Oral Radiology, Department of Oral Diagnosis, Piracicaba Dental School, University of Campinas, Piracicaba, São Paulo, Brazil.

Division of Oral Radiology, Department of Propedeutics and Integrated Clinic, School of Dentistry, Federal University of Bahia, Salvador, Bahia, Brazil.

出版信息

Eur J Dent Educ. 2025 Feb;29(1):149-154. doi: 10.1111/eje.13057. Epub 2024 Nov 19.

DOI:10.1111/eje.13057

PMID:39562504

Abstract

OBJECTIVES

To evaluate the performance of a Generative Pre-trained Transformer (GPT) in generating scientific abstracts in dentistry.

METHODS

Ten scientific articles in dental radiology had their original abstracts collected, while another 10 articles had their methodology and results added to a ChatGPT prompt to generate an abstract. All abstracts were randomised and compiled into a single file for subsequent assessment. Five evaluators classified whether the abstract was generated by a human using a 5-point scale and provided justifications within seven aspects: formatting, information accuracy, orthography, punctuation, terminology, text fluency, and writing style. Furthermore, an online GPT detector provided "Human Score" values, and a plagiarism detector assessed similarity with existing literature.

RESULTS

Sensitivity values for detecting human writing ranged from 0.20 to 0.70, with a mean of 0.58; specificity values ranged from 0.40 to 0.90, with a mean of 0.62; and accuracy values ranged from 0.50 to 0.80, with a mean of 0.60. Orthography and Punctuation were the most indicated aspects for the abstract generated by ChatGPT. The GPT detector revealed confidence levels for a "Human Score" of 16.9% for the AI-generated texts and plagiarism levels averaging 35%.

CONCLUSION

The GPT exhibited commendable performance in generating scientific abstracts when evaluated by humans, as the generated abstracts were indistinguishable from those generated by humans. When evaluated by an online GPT detector, the use of GPT became apparent.

摘要

目的

评估生成式预训练变换器（GPT）在生成牙科科学摘要方面的性能。

方法

收集了10篇牙科放射学领域的科学文章的原始摘要，另外10篇文章则将其方法和结果添加到ChatGPT提示中以生成摘要。所有摘要均被随机化并汇编成一个文件以供后续评估。五名评估人员使用5分制对摘要是否由人工生成进行分类，并在七个方面提供理由：格式、信息准确性、拼写、标点、术语、文本流畅性和写作风格。此外，一个在线GPT检测器提供了“人类分数”值，一个抄袭检测器评估了与现有文献的相似度。

结果

检测人工写作的灵敏度值范围为0.20至0.70，平均为0.58；特异性值范围为0.40至0.90，平均为0.62；准确性值范围为0.50至0.80，平均为0.60。拼写和标点是ChatGPT生成的摘要中最受关注的方面。GPT检测器显示，人工智能生成的文本的“人类分数”置信水平为16.9%，抄袭水平平均为35%。