Rawf Karwan M Hama, Abdulrahman Ayub O, Kamel Hana O, Hassan Lawen M, Ali Ahmad O
Department of Computer Science, College of Science, University of Halabja, Halabja, Kurdistan Region, F.R. Iraq.
Data Brief. 2024 Sep 19;57:110949. doi: 10.1016/j.dib.2024.110949. eCollection 2024 Dec.
Keyboard acoustic recognition is a pivotal area within cybersecurity and human-computer interaction, where the identification and analysis of keyboard sounds are used to enhance security measures. The performance of acoustic-based security systems can be influenced by factors such as the platform used, typing style, and environmental noise. To address these variations and provide a comprehensive resource, we present the Multi-Keyboard Acoustic (MKA) Datasets. These extensive datasets, meticulously gathered by a team in the Computer Science Department at the University of Halabja, include recordings from six widely-used platforms: HP, Lenovo, MSI, Mac, Messenger, and Zoom. The MKA datasets have structured data for each platform, including raw recordings, segmented sound files, and matrices derived from these sounds. They can be used by researchers in keylogging detection, cybersecurity, and other fields related to acoustic emanation attacks on keyboards. Moreover, the datasets capture the intricacies of typing behaviour with both hands and all ten fingers by carefully segmenting and pre-processing the data using the Praat tool, thus ensuring high-quality and dependable data. This comprehensive approach allows researchers to explore various aspects of keyboard sound recognition, contributing to the development of robust recognition algorithms and enhanced security measures. The MKA Datasets stand as one of the largest and most detailed datasets in this domain, offering significant potential for advancing research and improving defences against acoustic-based threats.
键盘声学识别是网络安全和人机交互领域的一个关键领域,其中键盘声音的识别和分析用于加强安全措施。基于声学的安全系统的性能可能会受到所使用的平台、打字风格和环境噪声等因素的影响。为了解决这些差异并提供一个全面的资源,我们展示了多键盘声学(MKA)数据集。这些广泛的数据集由哈拉布贾大学计算机科学系的一个团队精心收集,包括来自六个广泛使用的平台的录音:惠普、联想、微星、苹果、信使和Zoom。MKA数据集为每个平台都有结构化数据,包括原始录音、分段声音文件以及从这些声音中导出的矩阵。它们可被研究按键记录检测、网络安全以及与键盘声学发射攻击相关的其他领域的研究人员使用。此外,通过使用Praat工具仔细分割和预处理数据,这些数据集捕捉了双手和所有十个手指的打字行为的复杂性,从而确保了高质量和可靠的数据。这种全面的方法使研究人员能够探索键盘声音识别的各个方面,为强大的识别算法的开发和增强的安全措施做出贡献。MKA数据集是该领域最大、最详细的数据集之一,为推进研究和改进针对基于声学的威胁的防御提供了巨大潜力。