Sajid Muhammad, Sanaullah Muhammad, Fuzail Muhammad, Malik Tauqeer Safdar, Shuhidan Shuhaida Mohamed
Department of Computer Science, Air University, Islamabad, Pakistan.
Computer Science Department, NFC Institute of Engineering and Technology, Multan, Punjab, Pakistan.
PLoS One. 2025 Apr 8;20(4):e0319551. doi: 10.1371/journal.pone.0319551. eCollection 2025.
In text analysis, identifying plagiarism is a crucial area of study that looks for copied information in a document and determines whether or not the same author writes portions of the text. With the emergence of publicly available tools for content generation based on large language models, the problem of inherent plagiarism has grown in importance across various industries. Students are increasingly committing plagiarism as a result of the availability and use of computers in the classroom and the generally extensive accessibility of electronic information found on the internet. As a result, there is a rising need for reliable and precise detection techniques to deal with this changing environment. This paper compares several plagiarism detection techniques and looks into how well different detection systems can distinguish between content created by humans and content created by Artificial Intelligence (AI). This article systematically evaluates 189 research papers published between 2019 and 2024 to provide an overview of the research on computational approaches for plagiarism detection (PD). We suggest a new technically focused structure for efforts to prevent and identify plagiarism, types of plagiarism, and computational techniques for detecting plagiarism to organize the way the research contributions are presented. We demonstrated that the field of plagiarism detection is rife with ongoing research. Significant progress has been made in the field throughout the time we reviewed in terms of automatically identifying plagiarism that is highly obscured and hence difficult to recognize. The exploration of nontextual contents, the use of machine learning, and improved semantic text analysis techniques are the key sources of these advancements. Based on our analysis, we concluded that the combination of several analytical methodologies for textual and nontextual content features is the most promising subject for future research contributions to further improve the detection of plagiarism.
在文本分析中,识别抄袭是一个至关重要的研究领域,它要在文档中查找抄袭信息,并确定文本的不同部分是否由同一作者撰写。随着基于大语言模型的公开可用内容生成工具的出现,固有抄袭问题在各个行业中变得越来越重要。由于课堂上计算机的普及和使用以及互联网上电子信息的普遍广泛可获取性,学生抄袭的情况越来越多。因此,对可靠且精确的检测技术的需求日益增长,以应对这一不断变化的环境。本文比较了几种抄袭检测技术,并研究了不同的检测系统在区分人类创作的内容和人工智能(AI)创作的内容方面的表现。本文系统地评估了2019年至2024年发表的189篇研究论文,以概述抄袭检测(PD)计算方法的研究情况。我们提出了一种新的以技术为重点的结构,用于预防和识别抄袭的工作、抄袭的类型以及检测抄袭的计算技术,以组织研究贡献的呈现方式。我们证明了抄袭检测领域充满了正在进行的研究。在我们所回顾的这段时间里,该领域在自动识别高度隐蔽因而难以察觉的抄袭方面取得了重大进展。对非文本内容的探索、机器学习的应用以及改进的语义文本分析技术是这些进展的关键来源。基于我们的分析,我们得出结论,将多种用于文本和非文本内容特征的分析方法相结合,是未来研究贡献中最有前景的主题,有望进一步改进抄袭检测。