Liang Weixin, Zhang Yaohui, Wu Zhengxuan, Lepp Haley, Ji Wenlong, Zhao Xuandong, Cao Hancheng, Liu Sheng, He Siyu, Huang Zhi, Yang Diyi, Potts Christopher, Manning Christopher D, Zou James
Department of Computer Science, Stanford University, Stanford, CA, USA.
Department of Electrical Engineering, Stanford University, Stanford, CA, USA.
Nat Hum Behav. 2025 Aug 4. doi: 10.1038/s41562-025-02273-8.
Scientific publishing is the primary means of disseminating research findings. There has been speculation about how extensively large language models (LLMs) are being used in academic writing. Here we conduct a systematic analysis across 1,121,912 preprints and published papers from January 2020 to September 2024 on arXiv, bioRxiv and Nature portfolio journals, using a population-level framework based on word frequency shifts to estimate the prevalence of LLM-modified content over time. Our findings suggest a steady increase in LLM usage, with the largest and fastest growth estimated for computer science papers (up to 22%). By comparison, mathematics papers and the Nature portfolio showed lower evidence of LLM modification (up to 9%). LLM modification estimates were higher among papers from first authors who post preprints more frequently, papers in more crowded research areas and papers of shorter lengths. Our findings suggest that LLMs are being broadly used in scientific writing.
科学出版是传播研究成果的主要手段。关于大语言模型(LLMs)在学术写作中的使用程度一直存在猜测。在此,我们基于词频变化的总体水平框架,对2020年1月至2024年9月期间arXiv、bioRxiv和《自然》系列期刊上的1,121,912篇预印本和已发表论文进行了系统分析,以估计随时间推移大语言模型修改内容的流行程度。我们的研究结果表明大语言模型的使用稳步增加,计算机科学论文的增长幅度最大且最快(高达22%)。相比之下,数学论文和《自然》系列期刊显示出大语言模型修改的证据较少(高达9%)。在更频繁发布预印本的第一作者的论文、研究领域更热门的论文以及篇幅较短的论文中,大语言模型修改的估计比例更高。我们的研究结果表明大语言模型正在广泛应用于科学写作。