通过词汇冗余深入研究大型语言模型辅助的生物医学出版物写作。

Delving into LLM-assisted writing in biomedical publications through excess vocabulary.

作者信息

Kobak Dmitry, González-Márquez Rita, Horvát Emőke-Ágnes, Lause Jan

机构信息

Hertie Institute for AI in Brain Health, University of Tübingen, 72076 Tübingen, Germany.

Northwestern University, Evanston, 60208 IL, USA.

出版信息

Sci Adv. 2025 Jul 4;11(27):eadt3813. doi: 10.1126/sciadv.adt3813. Epub 2025 Jul 2.

DOI:10.1126/sciadv.adt3813

PMID:40601754

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC12219543/

Abstract

Large language models (LLMs) like ChatGPT can generate and revise text with human-level performance. These models come with clear limitations, can produce inaccurate information, and reinforce existing biases. Yet, many scientists use them for their scholarly writing. But how widespread is such LLM usage in the academic literature? To answer this question for the field of biomedical research, we present an unbiased, large-scale approach: We study vocabulary changes in more than 15 million biomedical abstracts from 2010 to 2024 indexed by PubMed and show how the appearance of LLMs led to an abrupt increase in the frequency of certain style words. This excess word analysis suggests that at least 13.5% of 2024 abstracts were processed with LLMs. This lower bound differed across disciplines, countries, and journals, reaching 40% for some subcorpora. We show that LLMs have had an unprecedented impact on scientific writing in biomedical research, surpassing the effect of major world events such as the COVID pandemic.

摘要

像ChatGPT这样的大语言模型（LLMs）能够以人类水平的表现生成和修改文本。这些模型存在明显的局限性，可能会产生不准确的信息，并强化现有的偏见。然而，许多科学家在学术写作中使用它们。但是，这种大语言模型在学术文献中的使用有多广泛呢？为了回答生物医学研究领域的这个问题，我们提出了一种无偏见的大规模方法：我们研究了2010年至2024年由PubMed索引的超过1500万篇生物医学摘要中的词汇变化，并展示了大语言模型的出现如何导致某些风格词汇的频率突然增加。这种多余词汇分析表明，2024年至少13.5%的摘要使用大语言模型进行了处理。这个下限在不同学科、国家和期刊中有所不同，某些子语料库达到了40%。我们表明，大语言模型对生物医学研究中的科学写作产生了前所未有的影响，超过了诸如新冠疫情等重大世界事件所产生的影响。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7507/12219543/91f341795db5/sciadv.adt3813-f1.jpg

相似文献

Delving into LLM-assisted writing in biomedical publications through excess vocabulary.通过词汇冗余深入研究大型语言模型辅助的生物医学出版物写作。

Sci Adv. 2025 Jul 4;11(27):eadt3813. doi: 10.1126/sciadv.adt3813. Epub 2025 Jul 2.

Use of Large Language Models to Classify Epidemiological Characteristics in Synthetic and Real-World Social Media Posts About Conjunctivitis Outbreaks: Infodemiology Study.利用大语言模型对合成及真实世界社交媒体上有关结膜炎爆发的帖子中的流行病学特征进行分类：信息流行病学研究

J Med Internet Res. 2025 Jul 2;27:e65226. doi: 10.2196/65226.

Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中，如果患者出现以下症状和体征，可判断其是否患有 COVID-19。

Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.

Survivor, family and professional experiences of psychosocial interventions for sexual abuse and violence: a qualitative evidence synthesis.性虐待和暴力的心理社会干预的幸存者、家庭和专业人员的经验：定性证据综合。

Cochrane Database Syst Rev. 2022 Oct 4;10(10):CD013648. doi: 10.1002/14651858.CD013648.pub2.

Evaluating a Large Language Model's Ability to Synthesize a Health Science Master's Thesis: Case Study.评估大型语言模型合成健康科学硕士论文的能力：案例研究

JMIR Form Res. 2025 Jul 3;9:e73248. doi: 10.2196/73248.

Electric fans for reducing adverse health impacts in heatwaves.用于减少热浪期间不良健康影响的电风扇。

Cochrane Database Syst Rev. 2012 Jul 11;2012(7):CD009888. doi: 10.1002/14651858.CD009888.pub2.

Home treatment for mental health problems: a systematic review.心理健康问题的居家治疗：一项系统综述

Health Technol Assess. 2001;5(15):1-139. doi: 10.3310/hta5150.

Education support services for improving school engagement and academic performance of children and adolescents with a chronic health condition.改善患有慢性病的儿童和青少年的学校参与度和学业成绩的教育支持服务。

Cochrane Database Syst Rev. 2023 Feb 8;2(2):CD011538. doi: 10.1002/14651858.CD011538.pub2.

Exercise versus airway clearance techniques for people with cystic fibrosis.运动与气道廓清技术治疗囊性纤维化。

Cochrane Database Syst Rev. 2022 Jun 22;6(6):CD013285. doi: 10.1002/14651858.CD013285.pub2.

Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病：网络荟萃分析。

Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.

引用本文的文献

AI content is tainting preprints: how moderators are fighting back.人工智能内容正在污染预印本：审核人员如何反击。

Nature. 2025 Aug;644(8077):590-591. doi: 10.1038/d41586-025-02469-y.

Evolutionary Trajectories of Consciousness: From Biological Foundations to Technological Horizons.意识的进化轨迹：从生物学基础到技术前沿。

Brain Sci. 2025 Jul 9;15(7):734. doi: 10.3390/brainsci15070734.

本文引用的文献

The landscape of biomedical research.生物医学研究的全景

Patterns (N Y). 2024 Apr 9;5(6):100968. doi: 10.1016/j.patter.2024.100968. eCollection 2024 Jun 14.

Use of large language models might affect our cognitive skills.使用大语言模型可能会影响我们的认知技能。

Nat Hum Behav. 2024 May;8(5):805-806. doi: 10.1038/s41562-024-01859-y.

Adapted large language models can outperform medical experts in clinical text summarization.经过改编的大型语言模型在临床文本总结方面的表现优于医学专家。

Nat Med. 2024 Apr;30(4):1134-1142. doi: 10.1038/s41591-024-02855-5. Epub 2024 Feb 27.

Machine culture.机器文化。

Nat Hum Behav. 2023 Nov;7(11):1855-1868. doi: 10.1038/s41562-023-01742-2. Epub 2023 Nov 20.

To protect science, we must use LLMs as zero-shot translators.为了保护科学，我们必须将大语言模型用作零样本翻译器。

Nat Hum Behav. 2023 Nov;7(11):1830-1832. doi: 10.1038/s41562-023-01744-0.

LLMs are not ready for editorial work.大型语言模型还不适用于编辑工作。

Nat Hum Behav. 2023 Nov;7(11):1814-1815. doi: 10.1038/s41562-023-01730-6.

Generative AI has a language problem.生成式人工智能存在语言问题。

Nat Hum Behav. 2023 Nov;7(11):1802-1803. doi: 10.1038/s41562-023-01716-4.

AI and science: what 1,600 researchers think.人工智能与科学：1600名研究人员的看法。

Nature. 2023 Sep;621(7980):672-675. doi: 10.1038/d41586-023-02980-0.

Fabrication and errors in the bibliographic citations generated by ChatGPT.ChatGPT生成的文献引用中的编造与错误。

Sci Rep. 2023 Sep 7;13(1):14045. doi: 10.1038/s41598-023-41032-5.

Funding agencies say no to AI peer review.资助机构对人工智能同行评审说“不”。

Science. 2023 Jul 21;381(6655):261. doi: 10.1126/science.adj8309. Epub 2023 Jul 20.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验

通过词汇冗余深入研究大型语言模型辅助的生物医学出版物写作。

Delving into LLM-assisted writing in biomedical publications through excess vocabulary.

作者信息

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献检索

文件翻译

深度研究

Suppr 超能文献

相似文献

引用本文的文献

本文引用的文献