文献检索文档翻译深度研究
Suppr Zotero 插件Zotero 插件
邀请有礼套餐&价格历史记录

新学期,新优惠

限时优惠:9月1日-9月22日

30天高级会员仅需29元

1天体验卡首发特惠仅需5.99元

了解详情
不再提醒
插件&应用
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
高级版
套餐订阅购买积分包
AI 工具
文献检索文档翻译深度研究
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2025

人机之争:在妇科和泌尿外科学中识别 ChatGPT 生成的摘要。

Human vs machine: identifying ChatGPT-generated abstracts in Gynecology and Urogynecology.

机构信息

Division of Urogynecology and Reconstructive Pelvic Surgery, Department of Obstetrics and Gynecology, University of Texas Southwestern Medical Center, Dallas, TX.

Division of Urogynecology and Reconstructive Pelvic Surgery, Department of Obstetrics and Gynecology, University of Texas Southwestern Medical Center, Dallas, TX.

出版信息

Am J Obstet Gynecol. 2024 Aug;231(2):276.e1-276.e10. doi: 10.1016/j.ajog.2024.04.045. Epub 2024 May 6.


DOI:10.1016/j.ajog.2024.04.045
PMID:38710267
Abstract

BACKGROUND: ChatGPT, a publicly available artificial intelligence large language model, has allowed for sophisticated artificial intelligence technology on demand. Indeed, use of ChatGPT has already begun to make its way into medical research. However, the medical community has yet to understand the capabilities and ethical considerations of artificial intelligence within this context, and unknowns exist regarding ChatGPT's writing abilities, accuracy, and implications for authorship. OBJECTIVE: We hypothesize that human reviewers and artificial intelligence detection software differ in their ability to correctly identify original published abstracts and artificial intelligence-written abstracts in the subjects of Gynecology and Urogynecology. We also suspect that concrete differences in writing errors, readability, and perceived writing quality exist between original and artificial intelligence-generated text. STUDY DESIGN: Twenty-five articles published in high-impact medical journals and a collection of Gynecology and Urogynecology journals were selected. ChatGPT was prompted to write 25 corresponding artificial intelligence-generated abstracts, providing the abstract title, journal-dictated abstract requirements, and select original results. The original and artificial intelligence-generated abstracts were reviewed by blinded Gynecology and Urogynecology faculty and fellows to identify the writing as original or artificial intelligence-generated. All abstracts were analyzed by publicly available artificial intelligence detection software GPTZero, Originality, and Copyleaks, and were assessed for writing errors and quality by artificial intelligence writing assistant Grammarly. RESULTS: A total of 157 reviews of 25 original and 25 artificial intelligence-generated abstracts were conducted by 26 faculty and 4 fellows; 57% of original abstracts and 42.3% of artificial intelligence-generated abstracts were correctly identified, yielding an average accuracy of 49.7% across all abstracts. All 3 artificial intelligence detectors rated the original abstracts as less likely to be artificial intelligence-written than the ChatGPT-generated abstracts (GPTZero, 5.8% vs 73.3%; P<.001; Originality, 10.9% vs 98.1%; P<.001; Copyleaks, 18.6% vs 58.2%; P<.001). The performance of the 3 artificial intelligence detection software differed when analyzing all abstracts (P=.03), original abstracts (P<.001), and artificial intelligence-generated abstracts (P<.001). Grammarly text analysis identified more writing issues and correctness errors in original than in artificial intelligence abstracts, including lower Grammarly score reflective of poorer writing quality (82.3 vs 88.1; P=.006), more total writing issues (19.2 vs 12.8; P<.001), critical issues (5.4 vs 1.3; P<.001), confusing words (0.8 vs 0.1; P=.006), misspelled words (1.7 vs 0.6; P=.02), incorrect determiner use (1.2 vs 0.2; P=.002), and comma misuse (0.3 vs 0.0; P=.005). CONCLUSION: Human reviewers are unable to detect the subtle differences between human and ChatGPT-generated scientific writing because of artificial intelligence's ability to generate tremendously realistic text. Artificial intelligence detection software improves the identification of artificial intelligence-generated writing, but still lacks complete accuracy and requires programmatic improvements to achieve optimal detection. Given that reviewers and editors may be unable to reliably detect artificial intelligence-generated texts, clear guidelines for reporting artificial intelligence use by authors and implementing artificial intelligence detection software in the review process will need to be established as artificial intelligence chatbots gain more widespread use.

摘要

背景:ChatGPT 是一种公共可用的人工智能大型语言模型,它可以按需实现复杂的人工智能技术。事实上,ChatGPT 已经开始在医学研究中得到应用。然而,医学界尚未了解人工智能在这方面的能力和伦理考虑因素,也不清楚 ChatGPT 的写作能力、准确性以及对作者身份的影响。

目的:我们假设人类评论员和人工智能检测软件在正确识别妇产科主题的原始发表摘要和人工智能撰写的摘要方面存在能力差异。我们还怀疑原始文本和人工智能生成的文本之间存在具体的写作错误、可理解性和感知写作质量差异。

研究设计:选择了 25 篇发表在高影响力医学期刊和一系列妇产科期刊上的文章。提示 ChatGPT 撰写 25 篇相应的人工智能生成的摘要,提供摘要标题、期刊规定的摘要要求和选定的原始结果。由盲法妇产科教员和研究员审查原始和人工智能生成的摘要,以确定写作是原始的还是人工智能生成的。所有的摘要都由公共的人工智能检测软件 GPTZero、Originality 和 Copyleaks 进行分析,并由人工智能写作助手 Grammarly 评估写作错误和质量。

结果:共有 26 名教员和 4 名研究员对 25 篇原始和 25 篇人工智能生成的摘要进行了 157 次审查;57%的原始摘要和 42.3%的人工智能生成的摘要被正确识别,所有摘要的平均准确率为 49.7%。所有 3 种人工智能检测工具都认为原始摘要比 ChatGPT 生成的摘要更不可能是人工智能撰写的(GPTZero,5.8%比 73.3%;P<.001;Originality,10.9%比 98.1%;P<.001;Copyleaks,18.6%比 58.2%;P<.001)。当分析所有摘要(P=.03)、原始摘要(P<.001)和人工智能生成的摘要(P<.001)时,这 3 种人工智能检测软件的性能存在差异。Grammarly 文本分析在原始摘要中发现了比人工智能摘要更多的写作问题和正确性错误,包括较低的 Grammarly 分数反映出较差的写作质量(82.3 比 88.1;P=.006)、更多的总写作问题(19.2 比 12.8;P<.001)、严重问题(5.4 比 1.3;P<.001)、混淆词(0.8 比 0.1;P=.006)、拼写错误(1.7 比 0.6;P=.02)、不正确的限定词用法(1.2 比 0.2;P=.002)和逗号误用(0.3 比 0.0;P=.005)。

结论:由于人工智能生成文本的能力非常逼真,人类评论员无法察觉人类和 ChatGPT 生成的科学写作之间的细微差别。人工智能检测软件提高了对人工智能生成文本的识别能力,但仍然缺乏完全的准确性,需要进行程序改进以实现最佳检测。鉴于评论员和编辑可能无法可靠地检测到人工智能生成的文本,因此需要建立明确的报告作者使用人工智能的指南,并在审查过程中实施人工智能检测软件,因为人工智能聊天机器人的使用越来越广泛。

相似文献

[1]
Human vs machine: identifying ChatGPT-generated abstracts in Gynecology and Urogynecology.

Am J Obstet Gynecol. 2024-8

[2]
It takes one to know one-Machine learning for identifying OBGYN abstracts written by ChatGPT.

Int J Gynaecol Obstet. 2024-6

[3]
Assessing the Reproducibility of the Structured Abstracts Generated by ChatGPT and Bard Compared to Human-Written Abstracts in the Field of Spine Surgery: Comparative Analysis.

J Med Internet Res. 2024-6-26

[4]
Comparisons of Quality, Correctness, and Similarity Between ChatGPT-Generated and Human-Written Abstracts for Basic Research: Cross-Sectional Study.

J Med Internet Res. 2023-12-25

[5]
Bridging the Gap Between Urological Research and Patient Understanding: The Role of Large Language Models in Automated Generation of Layperson's Summaries.

Urol Pract. 2023-9

[6]
Quality and correctness of AI-generated versus human-written abstracts in psychiatric research papers.

Psychiatry Res. 2024-11

[7]
Identification of ChatGPT-Generated Abstracts Within Shoulder and Elbow Surgery Poses a Challenge for Reviewers.

Arthroscopy. 2025-4

[8]
GPTZero Performance in Identifying Artificial Intelligence-Generated Medical Texts: A Preliminary Study.

J Korean Med Sci. 2023-9-25

[9]
Human versus artificial intelligence-generated arthroplasty literature: A single-blinded analysis of perceived communication, quality, and authorship source.

Int J Med Robot. 2024-2

[10]
Association of reviewer experience with discriminating human-written versus ChatGPT-written abstracts.

Int J Gynecol Cancer. 2024-5-6

引用本文的文献

[1]
ChatGPT-4o Compared With Human Researchers in Writing Plain-Language Summaries for Cochrane Reviews: A Blinded, Randomized Non-Inferiority Controlled Trial.

Cochrane Evid Synth Methods. 2025-7-28

[2]
ChatGPT in Academic Writing: A Scientometric Analysis of Literature Published Between 2022 and 2023.

J Empir Res Hum Res Ethics. 2025-7

[3]
Addressing Commonly Asked Questions in Urogynecology: Accuracy and Limitations of ChatGPT.

Int Urogynecol J. 2025-6-18

[4]
Artificial intelligence-driven circRNA vaccine development: multimodal collaborative optimization and a new paradigm for biomedical applications.

Brief Bioinform. 2025-5-1

[5]
AI-Assisted Hypothesis Generation to Address Challenges in Cardiotoxicity Research: Simulation Study Using ChatGPT With GPT-4o.

J Med Internet Res. 2025-5-15

[6]
An Evaluation of Current Trends in AI-Generated Text in Otolaryngology Publications.

Laryngoscope. 2025-4-25

[7]
AI detectors are poor western blot classifiers: a study of accuracy and predictive values.

PeerJ. 2025-2-20

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

推荐工具

医学文档翻译智能文献检索