• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

利用白鹭优化算法和深度学习增强人机交互的多语言识别。

Enhancing human computer interaction with coot optimization and deep learning for multi language identification.

机构信息

Candidate of Economic Sciences, Department of Economics and Management, Kazan Federal University, Elabuga Institute of KFU, Elabuga, 423604, Russia.

Moscow Aviation Institute (National Research University), Moscow, 125080, Russia.

出版信息

Sci Rep. 2024 Oct 3;14(1):22963. doi: 10.1038/s41598-024-74327-2.

DOI:10.1038/s41598-024-74327-2
PMID:39362948
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11450161/
Abstract

Human-Computer Interaction (HCI) is a multidisciplinary field focused on designing and utilizing computer technology, underlining the interaction interface between computers and humans. HCI aims to generate systems that allow consumers to relate to computers effectively, efficiently, and pleasantly. Multiple Spoken Language Identification (SLI) for HCI (MSLI for HCI) denotes the ability of a computer system to recognize and distinguish various spoken languages to enable more complete and handy interactions among consumers and technology. SLI utilizing deep learning (DL) involves using artificial neural networks (ANNs), a subset of DL models, to automatically detect and recognize the language spoken in an audio signal. DL techniques, particularly neural networks (NNs), have succeeded in various pattern detection tasks, including speech and language processing. This paper develops a novel Coot Optimizer Algorithm with a DL-Driven Multiple SLI and Detection (COADL-MSLID) technique for HCI applications. The COADL-MSLID approach aims to detect multiple spoken languages from the input audio regardless of gender, speaking style, and age. In the COADL-MSLID technique, the audio files are transformed into spectrogram images as a primary step. Besides, the COADL-MSLID technique employs the SqueezeNet model to produce feature vectors, and the COA is applied to the hyperparameter range of the SqueezeNet method. The COADL-MSLID technique exploits the SLID process's convolutional autoencoder (CAE) model. To underline the importance of the COADL-MSLID technique, a series of experiments were conducted on the benchmark dataset. The experimentation validation of the COADL-MSLID technique exhibits a greater accuracy result of 98.33% over other techniques.

摘要

人机交互 (HCI) 是一个多学科领域,专注于设计和利用计算机技术,强调计算机和人类之间的交互界面。HCI 的目标是生成允许消费者与计算机有效、高效和愉快地交互的系统。用于 HCI 的多语言识别 (MSLI) 表示计算机系统识别和区分各种语言的能力,以实现消费者和技术之间更完整和便捷的交互。利用深度学习 (DL) 的 SLI 涉及使用人工神经网络 (ANN),即 DL 模型的一个子集,自动检测和识别音频信号中所说的语言。DL 技术,特别是神经网络 (NN),在各种模式检测任务中取得了成功,包括语音和语言处理。本文为 HCI 应用开发了一种新颖的 COOT 优化器算法与 DL 驱动的多 SLI 和检测 (COADL-MSLID) 技术。COADL-MSLID 方法旨在从输入音频中检测多种语言,而不考虑性别、说话风格和年龄。在 COADL-MSLID 技术中,音频文件首先转换为频谱图图像。此外,COADL-MSLID 技术采用 SqueezeNet 模型生成特征向量,并将 COA 应用于 SqueezeNet 方法的超参数范围。COADL-MSLID 技术利用 SLID 过程的卷积自动编码器 (CAE) 模型。为了强调 COADL-MSLID 技术的重要性,在基准数据集上进行了一系列实验。COADL-MSLID 技术的实验验证显示,其准确性结果比其他技术高 98.33%。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b75d/11450161/cf165a1d3b36/41598_2024_74327_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b75d/11450161/26da004cfc17/41598_2024_74327_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b75d/11450161/d15dfd053b4d/41598_2024_74327_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b75d/11450161/cb732be2c8b5/41598_2024_74327_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b75d/11450161/0dc27c9a222d/41598_2024_74327_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b75d/11450161/b5dda7c48552/41598_2024_74327_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b75d/11450161/4f8f5afc7038/41598_2024_74327_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b75d/11450161/7bdccfafeaa6/41598_2024_74327_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b75d/11450161/e36e1d373b4c/41598_2024_74327_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b75d/11450161/4b718c29a098/41598_2024_74327_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b75d/11450161/9594d2e3ae6e/41598_2024_74327_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b75d/11450161/45e99d446ee4/41598_2024_74327_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b75d/11450161/c6ce0be9a8fd/41598_2024_74327_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b75d/11450161/cf165a1d3b36/41598_2024_74327_Fig13_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b75d/11450161/26da004cfc17/41598_2024_74327_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b75d/11450161/d15dfd053b4d/41598_2024_74327_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b75d/11450161/cb732be2c8b5/41598_2024_74327_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b75d/11450161/0dc27c9a222d/41598_2024_74327_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b75d/11450161/b5dda7c48552/41598_2024_74327_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b75d/11450161/4f8f5afc7038/41598_2024_74327_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b75d/11450161/7bdccfafeaa6/41598_2024_74327_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b75d/11450161/e36e1d373b4c/41598_2024_74327_Fig8_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b75d/11450161/4b718c29a098/41598_2024_74327_Fig9_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b75d/11450161/9594d2e3ae6e/41598_2024_74327_Fig10_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b75d/11450161/45e99d446ee4/41598_2024_74327_Fig11_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b75d/11450161/c6ce0be9a8fd/41598_2024_74327_Fig12_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b75d/11450161/cf165a1d3b36/41598_2024_74327_Fig13_HTML.jpg

相似文献

1
Enhancing human computer interaction with coot optimization and deep learning for multi language identification.利用白鹭优化算法和深度学习增强人机交互的多语言识别。
Sci Rep. 2024 Oct 3;14(1):22963. doi: 10.1038/s41598-024-74327-2.
2
Spoken Language Identification Using Deep Learning.基于深度学习的口语识别。
Comput Intell Neurosci. 2021 Sep 20;2021:5123671. doi: 10.1155/2021/5123671. eCollection 2021.
3
Deep Learning-Based Classification of Spoken English Digits.基于深度学习的英语口语数字分类。
Comput Intell Neurosci. 2022 Sep 28;2022:3364141. doi: 10.1155/2022/3364141. eCollection 2022.
4
A Combined CNN Architecture for Speech Emotion Recognition.一种用于语音情感识别的 CNN 架构组合。
Sensors (Basel). 2024 Sep 6;24(17):5797. doi: 10.3390/s24175797.
5
Leukemia detection and classification using computer-aided diagnosis system with falcon optimization algorithm and deep learning.利用基于猎鹰优化算法和深度学习的计算机辅助诊断系统进行白血病检测和分类。
Sci Rep. 2024 Sep 18;14(1):21755. doi: 10.1038/s41598-024-72900-3.
6
Emotion recognition for human-computer interaction using high-level descriptors.基于高层描述符的人机交互中的情感识别。
Sci Rep. 2024 May 27;14(1):12122. doi: 10.1038/s41598-024-59294-y.
7
DESIGN AND DEVELOPMENT OF HUMAN COMPUTER INTERFACE USING ELECTROOCULOGRAM WITH DEEP LEARNING.使用深度学习的眼电图进行人机界面的设计与开发。
Artif Intell Med. 2020 Jan;102:101765. doi: 10.1016/j.artmed.2019.101765. Epub 2019 Nov 21.
8
Towards laryngeal cancer diagnosis using Dandelion Optimizer Algorithm with ensemble learning on biomedical throat region images.基于生物医学喉部图像的 Dandelion Optimizer 算法集成学习进行喉癌诊断。
Sci Rep. 2024 Aug 24;14(1):19713. doi: 10.1038/s41598-024-70525-0.
9
Voice Synthesis Improvement by Machine Learning of Natural Prosody.通过自然韵律的机器学习改善语音合成。
Sensors (Basel). 2024 Mar 1;24(5):1624. doi: 10.3390/s24051624.
10
Artificial intelligence based optimization with deep learning model for blockchain enabled intrusion detection in CPS environment.基于人工智能的优化与深度学习模型在 CPS 环境中用于区块链的入侵检测
Sci Rep. 2022 Jul 28;12(1):12937. doi: 10.1038/s41598-022-17043-z.

引用本文的文献

1
Multi-scale feature fusion of deep convolutional neural networks on cancerous tumor detection and classification using biomedical images.基于生物医学图像的深度卷积神经网络在癌性肿瘤检测与分类中的多尺度特征融合
Sci Rep. 2025 Jan 7;15(1):1105. doi: 10.1038/s41598-024-84949-1.
2
Prediction of strata settlement in undersea metal mining based on deep forest.基于深度森林的海底金属采矿地层沉降预测
Sci Rep. 2024 Nov 18;14(1):28401. doi: 10.1038/s41598-024-80025-w.

本文引用的文献

1
Efhamni: A Deep Learning-Based Saudi Sign Language Recognition Application.埃法赫尼:一种基于深度学习的沙特手语识别应用。
Sensors (Basel). 2024 May 14;24(10):3112. doi: 10.3390/s24103112.
2
The use of deep learning integrating image recognition in language analysis technology in secondary school education.深度学习在中学教育语言分析技术中集成图像识别的应用。
Sci Rep. 2024 Feb 5;14(1):2888. doi: 10.1038/s41598-024-52592-5.
3
Toward a Vision-Based Intelligent System: A Stacked Encoded Deep Learning Framework for Sign Language Recognition.
基于视觉的智能系统:用于手语识别的堆叠编码深度学习框架。
Sensors (Basel). 2023 Nov 9;23(22):9068. doi: 10.3390/s23229068.
4
Design of network English autonomous learning education system based on human-computer interaction.基于人机交互的网络英语自主学习教育系统设计
Front Psychol. 2022 Sep 21;13:989884. doi: 10.3389/fpsyg.2022.989884. eCollection 2022.
5
Decoding lip language using triboelectric sensors with deep learning.使用带有深度学习的摩擦电传感器解码唇语。
Nat Commun. 2022 Mar 17;13(1):1401. doi: 10.1038/s41467-022-29083-0.
6
Spoken Language Identification Using Deep Learning.基于深度学习的口语识别。
Comput Intell Neurosci. 2021 Sep 20;2021:5123671. doi: 10.1155/2021/5123671. eCollection 2021.
7
Unsupervised Pre-training of a Deep LSTM-based Stacked Autoencoder for Multivariate Time Series Forecasting Problems.基于深度 LSTM 的堆叠自动编码器的无监督预训练用于多元时间序列预测问题。
Sci Rep. 2019 Dec 13;9(1):19038. doi: 10.1038/s41598-019-55320-6.