Apoorva K A, Sangeetha S
Department of Computer Applications, National Institute of Technology, Tiruchirapalli, Tamil Nadu, India.
SN Appl Sci. 2021;3(3):348. doi: 10.1007/s42452-020-04127-6. Epub 2021 Feb 18.
Electronic mail is the primary source of different cyber scams. Identifying the author of electronic mail is essential. It forms significant documentary evidence in the field of digital forensics. This paper presents a model for email author identification (or) attribution by utilizing deep neural networks and model-based clustering techniques. It is perceived that stylometry features in the authorship identification have gained a lot of importance as it enhances the author attribution task's accuracy. The experiments were performed on a publicly available benchmark Enron dataset, considering many authors. The proposed model achieves an accuracy of 94% on five authors, 90% on ten authors, 86% on 25 authors and 75% on the entire dataset for the Deep Neural Network technique, which is a good measure of accuracy on a highly imbalanced data. The second cluster-based technique yielded an excellent 86% accuracy on the entire dataset, considering the authors' number based on their contribution to the aggregate data.
电子邮件是不同网络诈骗的主要源头。识别电子邮件的作者至关重要。它在数字取证领域构成了重要的书面证据。本文提出了一种利用深度神经网络和基于模型的聚类技术进行电子邮件作者识别(或归属)的模型。可以看出,文体计量学特征在作者身份识别中变得非常重要,因为它提高了作者归属任务的准确性。实验是在一个公开可用的基准安然数据集上进行的,考虑了许多作者。对于深度神经网络技术,所提出的模型在五位作者的情况下准确率达到94%,在十位作者的情况下为90%,在二十五位作者的情况下为86%,在整个数据集上为75%,这对于高度不平衡的数据来说是一个很好的准确率衡量标准。考虑到作者对汇总数据的贡献数量,基于聚类的第二种技术在整个数据集上产生了出色的86%的准确率。