Al-Rawas Matheel, Qader Omar Abdul Jabbar Abdul, Othman Nurul Hanim, Ismail Noor Huda, Mamat Rosnani, Halim Mohamad Syahrizal, Abdullah Johari Yap, Noorani Tahir Yusuf
Prosthodontic Unit, School of Dental Sciences, Universiti Sains Malaysia, Health Campus, Kubang Kerian, Kota Bharu, Kelantan, Malaysia.
Hospital Pakar Universiti Sains Malaysia, Kubang Kerian, Kota Bharu, Kelantan, Malaysia.
Sci Rep. 2025 Apr 2;15(1):11275. doi: 10.1038/s41598-025-95387-y.
Several researchers have investigated the consequences of using ChatGPT in the education industry. Their findings raised doubts regarding the probable effects that ChatGPT may have on the academia. As such, the present study aimed to assess the ability of three methods, namely: (1) academicians (senior and young), (2) three AI detectors (GPT-2 output detector, Writefull GPT detector, and GPTZero) and (3) one plagiarism detector, to differentiate between human- and ChatGPT-written abstracts. A total of 160 abstracts were assessed by those three methods. Two senior and two young academicians used a newly developed rubric to assess the type and quality of 80 human-written and 80 ChatGPT-written abstracts. The results were statistically analysed using crosstabulation and chi-square analysis. Bivariate correlation and accuracy of the methods were assessed. The findings demonstrated that all the three methods made a different variety of incorrect assumptions. The level of the academician experience may play a role in the detection ability with senior academician 1 demonstrating superior accuracy. GPTZero AI and similarity detectors were very good at accurately identifying the abstracts origin. In terms of abstract type, every variable positively correlated, except in the case of similarity detectors (p < 0.05). Human-AI collaborations may significantly benefit the identification of the abstract origins.
几位研究人员调查了在教育行业使用ChatGPT的后果。他们的研究结果引发了人们对ChatGPT可能对学术界产生的影响的质疑。因此,本研究旨在评估三种方法的能力,即:(1)院士(资深和年轻),(2)三种人工智能检测器(GPT-2输出检测器、Writefull GPT检测器和GPTZero),以及(3)一种抄袭检测器,以区分人类撰写和ChatGPT撰写的摘要。这三种方法共评估了160篇摘要。两位资深院士和两位年轻院士使用新制定的评分标准对80篇人类撰写和80篇ChatGPT撰写的摘要的类型和质量进行评估。使用交叉表和卡方分析对结果进行统计分析。评估了这些方法的双变量相关性和准确性。研究结果表明,这三种方法都做出了不同种类的错误假设。院士的经验水平可能在检测能力中发挥作用,资深院士1表现出更高的准确性。GPTZero人工智能和相似度检测器非常擅长准确识别摘要来源。就摘要类型而言,除相似度检测器外,每个变量都呈正相关(p < 0.05)。人机合作可能会显著有助于识别摘要来源。