Citron Daniel T, Ginsparg Paul
Departments of Physics and.
Departments of Physics and Information Science, Cornell University, Ithaca, NY 14853
Proc Natl Acad Sci U S A. 2015 Jan 6;112(1):25-30. doi: 10.1073/pnas.1415135111. Epub 2014 Dec 8.
We consider the incidence of text "reuse" by researchers via a systematic pairwise comparison of the text content of all articles deposited to arXiv.org from 1991 to 2012. We measure the global frequencies of three classes of text reuse and measure how chronic text reuse is distributed among authors in the dataset. We infer a baseline for accepted practice, perhaps surprisingly permissive compared with other societal contexts, and a clearly delineated set of aberrant authors. We find a negative correlation between the amount of reused text in an article and its influence, as measured by subsequent citations. Finally, we consider the distribution of countries of origin of articles containing large amounts of reused text.
我们通过对1991年至2012年提交至arXiv.org的所有文章的文本内容进行系统的两两比较,来考量研究人员对文本“复用”的发生率。我们测量了三类文本复用的整体频率,并衡量了长期文本复用在数据集中作者之间的分布情况。我们推断出一个被认可做法的基线,与其他社会背景相比,这个基线或许出人意料地宽松,同时还确定了一组界限分明的异常作者。我们发现,一篇文章中复用文本的数量与其影响力(通过后续引用量来衡量)之间存在负相关。最后,我们考量了包含大量复用文本的文章的原产国分布情况。