Suppr超能文献

ChatGPT能在科学摘要中识别出自己的写作内容吗?

Can ChatGPT Recognize Its Own Writing in Scientific Abstracts?

作者信息

Sebo Paul

机构信息

Internal Medicine, University Institute for Primary Care, Geneva University Hospital, Geneva, CHE.

出版信息

Cureus. 2025 Jul 25;17(7):e88774. doi: 10.7759/cureus.88774. eCollection 2025 Jul.

Abstract

BACKGROUND

With the growing use of generative AI in scientific writing, distinguishing between AI-generated and human-authored content has become a pressing challenge. It remains unclear whether ChatGPT (OpenAI, San Francisco, CA) can accurately and consistently recognize its own output.

METHODS

We randomly selected 100 research articles published in 2000, before the advent of generative AI, from 10 high-impact internal medicine journals. For each article, a structured abstract was generated using ChatGPT-4.0 based on the full PDF. The original and AI-generated abstracts (n = 200) were then evaluated twice by ChatGPT-4.0, which was asked to rate the likelihood of authorship on a 0-10 scale (0 = definitely human, 10 = definitely ChatGPT, 5 = undetermined). Classifications of 0-4 were considered human, and 6-10 were considered AI generated.

RESULTS

Misclassification rates were high in both rounds (49% and 47.5%). No abstract received a score of 5. Score distributions overlapped substantially between groups, with no statistically significant difference (Wilcoxon p-value = 0.93 and 0.21). Cohen's kappa for binary classification was 0.33 (95% CI: 0.19-0.46) and weighted kappa on the 0-10 scale was 0.24 (95% CI: 0.15-0.34), both reflecting poor agreement.

CONCLUSION

ChatGPT-4.0 cannot reliably identify whether a scientific abstract was written by itself or by humans. More robust external tools are needed to ensure transparency in academic authorship.

摘要

背景

随着生成式人工智能在科学写作中的使用日益增加,区分人工智能生成的内容和人类撰写的内容已成为一项紧迫的挑战。尚不清楚ChatGPT(OpenAI,加利福尼亚州旧金山)是否能够准确且一致地识别其自己的输出。

方法

我们从10种高影响力的内科医学期刊中随机选择了100篇在生成式人工智能出现之前的2000年发表的研究文章。对于每篇文章,基于完整的PDF使用ChatGPT-4.0生成结构化摘要。然后让ChatGPT-4.0对原始摘要和人工智能生成的摘要(共200个)进行两次评估,要求其在0至10分的量表上对作者身份的可能性进行评分(0分 = 肯定是人类撰写,10分 = 肯定是ChatGPT撰写,5分 = 无法确定)。0至4分的分类被视为人类撰写,6至10分的分类被视为人工智能生成。

结果

两轮评估中的错误分类率都很高(分别为49%和47.5%)。没有摘要获得5分。两组之间的分数分布有很大重叠,无统计学显著差异(Wilcoxon p值 = 0.93和0.21)。二元分类的科恩kappa系数为0.33(95%置信区间:0.19 - 0.46),0至10分量表上的加权kappa系数为0.24(95%置信区间:0.15 - 0.34),两者均反映出一致性较差。

结论

ChatGPT-4.0无法可靠地识别一篇科学摘要是由其自身还是由人类撰写的。需要更强大的外部工具来确保学术作者身份的透明度。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f936/12375800/6a6c605975ab/cureus-0017-00000088774-i01.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验