Porter Simon J, McIntosh Leslie D
Digital Science, London, GB, UK.
Sci Rep. 2024 Nov 28;14(1):29569. doi: 10.1038/s41598-024-71230-8.
It is estimated that 2% of all journal submissions across all disciplines originate from paper mills, both creating significant risk that the body of research that we rely on to progress becomes corrupted, and placing undue burden on the submission process to reject these articles. By understanding how the business of paper mills-the technological approaches that they adopt, as well as the social structures that they require to operate-the research community can be empowered to develop strategies that make it harder, or ideally impossible for them to operate. Most of the contemporary work in paper-mill detection has focused on identifying the signals that have been left behind inside the text or structure of fabricated papers that result from the technological approaches that paper mills employ. As technologies employed by paper mills advance, these signals will become harder to detect. Fabricated papers do not just need text, images, and data however, they also require a fabricated or partially fabricated network of authors. Most 'authors' on a fabricated paper have not been associated with the research, but rather are added through a transaction. This lack of deeper connection means that there is a low likelihood that co-authors on fabricated papers will ever appear together on the same paper more than once. This paper constructs a model that encodes some of the key characteristics of this activity in an 'authorship-for-sale' network with the aim to create a robust method to detect this type of activity. A characteristic network fingerprint arises from this model that provides a robust statistical approach to the detection of paper-mill networks. The model suggested in this paper detects networks that have a statistically significant overlap with other approaches that principally rely on textual analysis for the detection of fraudulent papers. Researchers connected to networks identified using the methodology outlined in this paper are shown to be connected with 37% of papers identified through the tortured-phrase and clay-feet methods deployed in the Problematic Paper Screener website. Finally, methods to limit the expansion and propagation of these networks is discussed both in technological and social terms.
据估计,所有学科的期刊投稿中有2%来自论文工厂,这既带来了巨大风险,即我们赖以推动研究进展的研究主体被腐蚀,也给稿件筛选流程带来了不合理的负担,需要拒绝这些文章。通过了解论文工厂的运作方式——它们采用的技术手段以及运作所需的社会结构——研究界可以有能力制定策略,使论文工厂更难甚至无法运作。目前,大多数关于论文工厂检测的工作都集中在识别由于论文工厂采用的技术手段而在伪造论文的文本或结构中留下的信号。随着论文工厂采用的技术不断进步,这些信号将越来越难以检测。然而,伪造的论文不仅需要文本、图像和数据,还需要一个伪造或部分伪造的作者网络。一篇伪造论文上的大多数“作者”并未参与研究,而是通过交易添加的。这种缺乏深度联系意味着,伪造论文的共同作者再次出现在同一篇论文上的可能性很低。本文构建了一个模型,在一个“出售署名”网络中对这种活动的一些关键特征进行编码,旨在创建一种强大的方法来检测此类活动。该模型产生了一个特征网络指纹,为检测论文工厂网络提供了一种强大的统计方法。本文提出的模型检测到的网络与其他主要依靠文本分析来检测欺诈性论文的方法在统计上有显著重叠。使用本文概述的方法识别出与网络相关的研究人员,与通过问题论文筛选网站部署的“ tortured-phrase”和“ clay-feet”方法识别出的37%的论文相关联。最后,从技术和社会层面讨论了限制这些网络扩张和传播的方法。