Tan Gloria Jennis, Sulong Ghazali, Rahim Mohd Shafry Mohd
Faculty of Computing, Universiti Teknologi Malaysia, Johor, Malaysia; School of Informatics and Applied Mathematics, Universiti Malaysia Terengganu, Terengganu, Malaysia.
Forensic Sci Int. 2017 Oct;279:41-52. doi: 10.1016/j.forsciint.2017.07.034. Epub 2017 Aug 4.
This paper presents a review on the state of the art in offline text-independent writer identification methods for three major languages, namely English, Chinese and Arabic, which were published in literatures from 2011 till 2016. For ease of discussions, we grouped the techniques into three categories: texture-, structure-, and allograph-based. Results are analysed, compared and tabulated along with datasets used for fair and just comparisons. It is observed that during that period, there are significant progresses achieved on English and Arabic; however, the growth on Chinese is rather slow and far from satisfactory in comparison to its wide usage. This is due to its complex writing structure. Meanwhile, issues on datasets used by previous studies are also highlighted because the size matter - accuracy of the writer identification deteriorates as database size increases.
本文对2011年至2016年发表在文献中的英语、中文和阿拉伯语这三种主要语言的离线文本无关作者识别方法的现状进行了综述。为便于讨论,我们将这些技术分为三类:基于纹理、基于结构和基于书写变体。对结果进行了分析、比较并制成表格,同时列出了用于公平公正比较的数据集。据观察,在那段时期,英语和阿拉伯语取得了显著进展;然而,与中文的广泛使用相比,其发展相当缓慢且远不能令人满意。这是由于其复杂的书写结构。同时,还强调了先前研究中使用的数据集存在的问题,因为规模很重要——作者识别的准确性会随着数据库规模的增加而下降。