Pontifical Catholic University of Parana (PUCPR), R. Imaculada Conceição, 1155 Curitiba, PR, Brazil.
Forensic Sci Int. 2013 May 10;228(1-3):100-4. doi: 10.1016/j.forsciint.2013.02.025. Epub 2013 Mar 24.
In this paper we compare different compression models for authorship attribution. To this end, three different types of compressors, Lempel-Ziv type (GZip), block sorting type (BZip) and statistical type (PPM), along with two different similarity measures were considered in our experiments. Besides, two different attribution methods are analyzed in this paper. Through a series of experiments performed on two different databases, we were able to show that all the compressors behave similarly, but the similarity measures can vary considerably depending on the strategy used for authorship attribution. Our results corroborate with the literature in the sense that compression models are a good alternative for authorship attribution surpassing traditional pattern recognition systems based on classifiers and feature extraction.
在本文中,我们比较了不同的压缩模型用于作者归属分析。为此,我们在实验中考虑了三种不同类型的压缩器:Lempel-Ziv 类型(GZip)、块排序类型(BZip)和统计类型(PPM),以及两种不同的相似度度量。此外,本文还分析了两种不同的归属方法。通过在两个不同的数据库上进行的一系列实验,我们能够表明,所有的压缩器表现相似,但相似度度量可能会根据用于作者归属分析的策略而有很大的差异。我们的结果与文献一致,即压缩模型是一种很好的替代方法,用于作者归属分析,超越了基于分类器和特征提取的传统模式识别系统。