Candidate of Economic Sciences, Department of Economics and Management, Kazan Federal University, Elabuga Institute of KFU, Elabuga, 423604, Russia.
Moscow Aviation Institute (National Research University), Moscow, 125080, Russia.
Sci Rep. 2024 Oct 3;14(1):22963. doi: 10.1038/s41598-024-74327-2.
Human-Computer Interaction (HCI) is a multidisciplinary field focused on designing and utilizing computer technology, underlining the interaction interface between computers and humans. HCI aims to generate systems that allow consumers to relate to computers effectively, efficiently, and pleasantly. Multiple Spoken Language Identification (SLI) for HCI (MSLI for HCI) denotes the ability of a computer system to recognize and distinguish various spoken languages to enable more complete and handy interactions among consumers and technology. SLI utilizing deep learning (DL) involves using artificial neural networks (ANNs), a subset of DL models, to automatically detect and recognize the language spoken in an audio signal. DL techniques, particularly neural networks (NNs), have succeeded in various pattern detection tasks, including speech and language processing. This paper develops a novel Coot Optimizer Algorithm with a DL-Driven Multiple SLI and Detection (COADL-MSLID) technique for HCI applications. The COADL-MSLID approach aims to detect multiple spoken languages from the input audio regardless of gender, speaking style, and age. In the COADL-MSLID technique, the audio files are transformed into spectrogram images as a primary step. Besides, the COADL-MSLID technique employs the SqueezeNet model to produce feature vectors, and the COA is applied to the hyperparameter range of the SqueezeNet method. The COADL-MSLID technique exploits the SLID process's convolutional autoencoder (CAE) model. To underline the importance of the COADL-MSLID technique, a series of experiments were conducted on the benchmark dataset. The experimentation validation of the COADL-MSLID technique exhibits a greater accuracy result of 98.33% over other techniques.
人机交互 (HCI) 是一个多学科领域,专注于设计和利用计算机技术,强调计算机和人类之间的交互界面。HCI 的目标是生成允许消费者与计算机有效、高效和愉快地交互的系统。用于 HCI 的多语言识别 (MSLI) 表示计算机系统识别和区分各种语言的能力,以实现消费者和技术之间更完整和便捷的交互。利用深度学习 (DL) 的 SLI 涉及使用人工神经网络 (ANN),即 DL 模型的一个子集,自动检测和识别音频信号中所说的语言。DL 技术,特别是神经网络 (NN),在各种模式检测任务中取得了成功,包括语音和语言处理。本文为 HCI 应用开发了一种新颖的 COOT 优化器算法与 DL 驱动的多 SLI 和检测 (COADL-MSLID) 技术。COADL-MSLID 方法旨在从输入音频中检测多种语言,而不考虑性别、说话风格和年龄。在 COADL-MSLID 技术中,音频文件首先转换为频谱图图像。此外,COADL-MSLID 技术采用 SqueezeNet 模型生成特征向量,并将 COA 应用于 SqueezeNet 方法的超参数范围。COADL-MSLID 技术利用 SLID 过程的卷积自动编码器 (CAE) 模型。为了强调 COADL-MSLID 技术的重要性,在基准数据集上进行了一系列实验。COADL-MSLID 技术的实验验证显示,其准确性结果比其他技术高 98.33%。