Jahanirad Mehdi, Anuar Nor Badrul, Wahab Ainuddin Wahid Abdul
Center of HELP CAT Information Technology Programmes, HELP College of Arts and Technology, Level 5, Kompleks Metro Pudu, Fraser Business Park, 55100 Kuala Lumpur, Malaysia.
Department of Computer System and Technology, Faculty of Computer Science and Information Technology, University of Malaya, 50603 Kuala Lumpur, Malaysia.
Forensic Sci Int. 2017 Mar;272:111-126. doi: 10.1016/j.forsciint.2017.01.010. Epub 2017 Jan 17.
The VoIP services provide fertile ground for criminal activity, thus identifying the transmitting computer devices from recorded VoIP call may help the forensic investigator to reveal useful information. It also proves the authenticity of the call recording submitted to the court as evidence. This paper extended the previous study on the use of recorded VoIP call for blind source computer device identification. Although initial results were promising but theoretical reasoning for this is yet to be found. The study suggested computing entropy of mel-frequency cepstrum coefficients (entropy-MFCC) from near-silent segments as an intrinsic feature set that captures the device response function due to the tolerances in the electronic components of individual computer devices. By applying the supervised learning techniques of naïve Bayesian, linear logistic regression, neural networks and support vector machines to the entropy-MFCC features, state-of-the-art identification accuracy of near 99.9% has been achieved on different sets of computer devices for both call recording and microphone recording scenarios. Furthermore, unsupervised learning techniques, including simple k-means, expectation-maximization and density-based spatial clustering of applications with noise (DBSCAN) provided promising results for call recording dataset by assigning the majority of instances to their correct clusters.
VoIP服务为犯罪活动提供了滋生土壤,因此从录制的VoIP通话中识别发送端计算机设备,可能有助于法医调查人员揭示有用信息。这也证明了提交给法庭作为证据的通话记录的真实性。本文扩展了之前关于利用录制的VoIP通话进行盲源计算机设备识别的研究。尽管初步结果很有前景,但尚未找到其理论依据。该研究建议,将近静音片段的梅尔频率倒谱系数的计算熵(熵-MFCC)作为一种内在特征集,由于各台计算机设备电子元件的公差,该特征集能够捕捉设备响应函数。通过将朴素贝叶斯、线性逻辑回归、神经网络和支持向量机等监督学习技术应用于熵-MFCC特征,在不同组的计算机设备上,针对通话记录和麦克风记录场景均实现了近99.9%的先进识别准确率。此外,包括简单k均值、期望最大化和基于密度的带噪声应用空间聚类(DBSCAN)在内的无监督学习技术,通过将大多数实例分配到正确的聚类中,为通话记录数据集提供了有前景的结果。