Son Yeongmin, Park Jae Wan
Department of Digital Media, Soongsil University, Seoul 07027, Republic of Korea.
Global School of Media, Soongsil University, Seoul 07027, Republic of Korea.
Sensors (Basel). 2024 Mar 14;24(6):1872. doi: 10.3390/s24061872.
The ubiquity of smartphones today enables the widespread utilization of voice recording for diverse purposes. Consequently, the submission of voice recordings as digital evidence in legal proceedings has notably increased, alongside a rise in allegations of recording file forgery. This trend highlights the growing significance of audio file authentication. This study aims to develop a deep learning methodology capable of identifying forged files, particularly those altered using "Mixed Paste" commands, a technique not previously addressed. The proposed deep learning framework is a composite model, integrating a convolutional neural network and a long short-term memory model. It is designed based on the extraction of features from spectrograms and sequences of Korean consonant types. The training of this model utilizes an authentic dataset of forged audio recordings created on an iPhone, modified via "Mixed Paste", and encoded. This hybrid model demonstrates a high accuracy rate of 97.5%. To validate the model's efficacy, tests were conducted using various manipulated audio files. The findings reveal that the model's effectiveness is not contingent on the smartphone model or the audio editing software employed. We anticipate that this research will advance the field of audio forensics through a novel hybrid model approach.
如今智能手机的普及使得语音记录被广泛用于各种目的。因此,在法律程序中作为数字证据提交语音记录的情况显著增加,同时录音文件伪造指控也有所上升。这一趋势凸显了音频文件认证日益重要。本研究旨在开发一种深度学习方法,能够识别伪造文件,特别是那些使用“混合粘贴”命令更改的文件,这是一种此前未涉及的技术。所提出的深度学习框架是一个复合模型,集成了卷积神经网络和长短期记忆模型。它基于从声谱图和韩语音素类型序列中提取特征进行设计。该模型的训练使用了在iPhone上创建的伪造音频记录的真实数据集,通过“混合粘贴”进行修改并编码。这种混合模型展示了97.5%的高准确率。为验证模型的有效性,使用各种经过处理的音频文件进行了测试。结果表明,该模型的有效性并不取决于智能手机型号或所使用的音频编辑软件。我们预计这项研究将通过一种新颖的混合模型方法推动音频取证领域的发展。