Suppr超能文献

一种用于作者身份验证的两层学习模型。

A two level learning model for authorship authentication.

机构信息

Computer Science Department, Faculty of Computers & Artificial Intelligence, Benha University, Benha, Egypt.

Information System Department, Faculty of Computers & Artificial Intelligence, Benha University, Benha, Egypt.

出版信息

PLoS One. 2021 Aug 5;16(8):e0255661. doi: 10.1371/journal.pone.0255661. eCollection 2021.

Abstract

Nowadays, forensic authorship authentication plays a vital role in identifying the number of unknown authors as a result of the world's rapidly rising internet use. This paper presents two-level learning techniques for authorship authentication. The learning technique is supplied with linguistic knowledge, statistical features, and vocabulary features to enhance its efficiency instead of learning only. The linguistic knowledge is represented through lexical analysis features such as part of speech. In this study, a two-level classifier has been presented to capture the best predictive performance for identifying authorship. The first classifier is based on vocabulary features that detect the frequency with which each author uses certain words. This classifier's results are fed to the second one which is based on a learning technique. It depends on lexical, statistical and linguistic features. All of the three sets of features describe the author's writing styles in numerical forms. Through this work, many new features are proposed for identifying the author's writing style. Although, the proposed new methodology is tested for Arabic writings, it is general and can be applied to any language. According to the used machine learning models, the experiment carried out shows that the trained two-level classifier achieves an accuracy ranging from 94% to 96.16%.

摘要

如今,由于世界互联网的飞速发展,鉴定未知作者数量的取证作者身份验证发挥着至关重要的作用。本文提出了用于作者身份验证的两级学习技术。该学习技术提供了语言知识、统计特征和词汇特征,以提高其效率,而不仅仅是学习。语言知识通过词汇分析特征(如词性)来表示。在这项研究中,提出了一个两级分类器来捕捉识别作者身份的最佳预测性能。第一个分类器基于词汇特征,用于检测每个作者使用某些单词的频率。此分类器的结果将提供给第二个分类器,它基于学习技术。它取决于词汇、统计和语言特征。所有这三组特征都以数字形式描述了作者的写作风格。通过这项工作,提出了许多新的特征来识别作者的写作风格。尽管所提出的新方法经过了阿拉伯语写作的测试,但它是通用的,可以应用于任何语言。根据使用的机器学习模型,所进行的实验表明,经过训练的两级分类器的准确率范围为 94%到 96.16%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b335/8341647/6a40f898d5ba/pone.0255661.g001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验