• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

从机器学习视角评估胡椒机器人的语音识别系统

Assessment of Pepper Robot's Speech Recognition System through the Lens of Machine Learning.

作者信息

Pande Akshara, Mishra Deepti

机构信息

Educational Technology Laboratory, Intelligent System and Analytics Group, Department of Computer Science (IDI), Norwegian University of Science and Technology, 2815 Gjøvik, Norway.

出版信息

Biomimetics (Basel). 2024 Jun 27;9(7):391. doi: 10.3390/biomimetics9070391.

DOI:10.3390/biomimetics9070391
PMID:39056832
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11274617/
Abstract

Speech comprehension can be challenging due to multiple factors, causing inconvenience for both the speaker and the listener. In such situations, using a humanoid robot, Pepper, can be beneficial as it can display the corresponding text on its screen. However, prior to that, it is essential to carefully assess the accuracy of the audio recordings captured by Pepper. Therefore, in this study, an experiment is conducted with eight participants with the primary objective of examining Pepper's speech recognition system with the help of audio features such as Mel-Frequency Cepstral Coefficients, spectral centroid, spectral flatness, the Zero-Crossing Rate, pitch, and energy. Furthermore, the K-means algorithm was employed to create clusters based on these features with the aim of selecting the most suitable cluster with the help of the speech-to-text conversion tool Whisper. The selection of the best cluster is accomplished by finding the maximum accuracy data points lying in a cluster. A criterion of discarding data points with values of WER above 0.3 is imposed to achieve this. The findings of this study suggest that a distance of up to one meter from the humanoid robot Pepper is suitable for capturing the best speech recordings. In contrast, age and gender do not influence the accuracy of recorded speech. The proposed system will provide a significant strength in settings where subtitles are required to improve the comprehension of spoken statements.

摘要

由于多种因素,言语理解可能具有挑战性,这给说话者和听众都带来了不便。在这种情况下,使用人形机器人Pepper可能会有所帮助,因为它可以在其屏幕上显示相应的文本。然而,在此之前,仔细评估Pepper捕获的音频记录的准确性至关重要。因此,在本研究中,对八名参与者进行了一项实验,其主要目的是借助诸如梅尔频率倒谱系数、谱质心、谱平坦度、过零率、音高和能量等音频特征来检验Pepper的语音识别系统。此外,采用K均值算法基于这些特征创建聚类,目的是借助语音转文本转换工具Whisper选择最合适的聚类。通过找到聚类中准确率最高的数据点来完成最佳聚类的选择。为此,施加了一个丢弃字错误率高于0.3的数据点的标准。本研究的结果表明,与人形机器人Pepper保持最多一米的距离适合捕获最佳语音记录。相比之下,年龄和性别不会影响录制语音的准确性。所提出的系统将在需要字幕以提高对口头陈述理解的环境中提供显著优势。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7303/11274617/27516924a4dc/biomimetics-09-00391-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7303/11274617/7c5169f8e91b/biomimetics-09-00391-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7303/11274617/dfea730da135/biomimetics-09-00391-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7303/11274617/241567d647fa/biomimetics-09-00391-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7303/11274617/73ce8625eeba/biomimetics-09-00391-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7303/11274617/97f6c13b8fc7/biomimetics-09-00391-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7303/11274617/39966c9156f6/biomimetics-09-00391-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7303/11274617/ef4c90a2b385/biomimetics-09-00391-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7303/11274617/f4cab8165b09/biomimetics-09-00391-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7303/11274617/2e74ed1ed42e/biomimetics-09-00391-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7303/11274617/a390f23b0b28/biomimetics-09-00391-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7303/11274617/8f4bb1cfc096/biomimetics-09-00391-g011a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7303/11274617/f938fcaecb8f/biomimetics-09-00391-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7303/11274617/82e8e280a191/biomimetics-09-00391-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7303/11274617/27516924a4dc/biomimetics-09-00391-g014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7303/11274617/7c5169f8e91b/biomimetics-09-00391-g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7303/11274617/dfea730da135/biomimetics-09-00391-g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7303/11274617/241567d647fa/biomimetics-09-00391-g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7303/11274617/73ce8625eeba/biomimetics-09-00391-g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7303/11274617/97f6c13b8fc7/biomimetics-09-00391-g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7303/11274617/39966c9156f6/biomimetics-09-00391-g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7303/11274617/ef4c90a2b385/biomimetics-09-00391-g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7303/11274617/f4cab8165b09/biomimetics-09-00391-g008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7303/11274617/2e74ed1ed42e/biomimetics-09-00391-g009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7303/11274617/a390f23b0b28/biomimetics-09-00391-g010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7303/11274617/8f4bb1cfc096/biomimetics-09-00391-g011a.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7303/11274617/f938fcaecb8f/biomimetics-09-00391-g012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7303/11274617/82e8e280a191/biomimetics-09-00391-g013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7303/11274617/27516924a4dc/biomimetics-09-00391-g014.jpg

相似文献

1
Assessment of Pepper Robot's Speech Recognition System through the Lens of Machine Learning.从机器学习视角评估胡椒机器人的语音识别系统
Biomimetics (Basel). 2024 Jun 27;9(7):391. doi: 10.3390/biomimetics9070391.
2
Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区,服用抗叶酸抗疟药物的人群中,叶酸补充剂与疟疾易感性和严重程度的关系。
Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.
3
Beyond simulation: Unlocking the frontiers of humanoid robot capability and intelligence with Pepper's open-source digital twin.超越模拟:借助Pepper的开源数字替身拓展人形机器人能力与智能的边界
Heliyon. 2024 Jul 10;10(14):e34456. doi: 10.1016/j.heliyon.2024.e34456. eCollection 2024 Jul 30.
4
The Role of Coherent Robot Behavior and Embodiment in Emotion Perception and Recognition During Human-Robot Interaction: Experimental Study.连贯机器人行为与具身性在人机交互中情绪感知与识别中的作用:实验研究
JMIR Hum Factors. 2024 Jan 26;11:e45494. doi: 10.2196/45494.
5
3D CNN-Based Speech Emotion Recognition Using K-Means Clustering and Spectrograms.基于3D卷积神经网络的语音情感识别:使用K均值聚类和频谱图
Entropy (Basel). 2019 May 8;21(5):479. doi: 10.3390/e21050479.
6
Speech emotion recognition using machine learning techniques: Feature extraction and comparison of convolutional neural network and random forest.基于机器学习技术的语音情感识别:卷积神经网络和随机森林的特征提取与比较。
PLoS One. 2023 Nov 21;18(11):e0291500. doi: 10.1371/journal.pone.0291500. eCollection 2023.
7
Automatic Assessment of Aphasic Speech Sensed by Audio Sensors for Classification into Aphasia Severity Levels to Recommend Speech Therapies.利用音频传感器自动评估失语症患者的语音,以对失语症严重程度进行分类,从而推荐相应的语言治疗方法。
Sensors (Basel). 2022 Sep 14;22(18):6966. doi: 10.3390/s22186966.
8
Models and Approaches for Comprehension of Dysarthric Speech Using Natural Language Processing: Systematic Review.使用自然语言处理理解构音障碍语音的模型与方法:系统综述
JMIR Rehabil Assist Technol. 2023 Oct 27;10:e44489. doi: 10.2196/44489.
9
Hybrid machine learning classification scheme for speaker identification.用于说话人识别的混合机器学习分类方案。
J Forensic Sci. 2022 May;67(3):1033-1048. doi: 10.1111/1556-4029.15006. Epub 2022 Feb 9.
10
Privacy-Preserving Deep Speaker Separation for Smartphone-Based Passive Speech Assessment.用于基于智能手机的被动语音评估的隐私保护深度语音分离
IEEE Open J Eng Med Biol. 2021 Mar 4;2:304-313. doi: 10.1109/OJEMB.2021.3063994. eCollection 2021.

本文引用的文献

1
Machine learning-based infant crying interpretation.基于机器学习的婴儿哭声解读。
Front Artif Intell. 2024 Feb 8;7:1337356. doi: 10.3389/frai.2024.1337356. eCollection 2024.
2
Invoking and identifying task-oriented interlocutor confusion in human-robot interaction.在人机交互中引发并识别面向任务的对话者困惑。
Front Robot AI. 2023 Nov 20;10:1244381. doi: 10.3389/frobt.2023.1244381. eCollection 2023.
3
Artificial Emotional Intelligence in Socially Assistive Robots for Older Adults: A Pilot Study.用于老年人的社交辅助机器人中的人工情感智能:一项初步研究。
IEEE Trans Affect Comput. 2023 Jul-Sep;14(3):2020-2032. doi: 10.1109/taffc.2022.3143803. Epub 2022 Jan 18.
4
A scoping review on the use of speech-to-text technology for adolescents with learning difficulties in secondary education.针对中学学习困难青少年使用语音转文字技术的范围综述。
Disabil Rehabil Assist Technol. 2024 Apr;19(3):1103-1116. doi: 10.1080/17483107.2022.2149865. Epub 2022 Nov 25.
5
Improved Feature Parameter Extraction from Speech Signals Using Machine Learning Algorithm.基于机器学习算法的语音信号特征参数提取改进。
Sensors (Basel). 2022 Oct 24;22(21):8122. doi: 10.3390/s22218122.
6
A Comparison of Machine Learning Algorithms and Feature Sets for Automatic Vocal Emotion Recognition in Speech.机器学习算法和特征集在语音自动情感识别中的比较
Sensors (Basel). 2022 Oct 6;22(19):7561. doi: 10.3390/s22197561.
7
Gradient Boosting Machine and Efficient Combination of Features for Speech-Based Detection of COVID-19.基于语音的 COVID-19 检测的梯度提升机和有效特征组合
IEEE J Biomed Health Inform. 2022 Nov;26(11):5364-5371. doi: 10.1109/JBHI.2022.3197910. Epub 2022 Nov 10.
8
Audio-Visual Automatic Speech Recognition Towards Education for Disabilities.面向残障教育的视听自动语音识别。
J Autism Dev Disord. 2023 Sep;53(9):3581-3594. doi: 10.1007/s10803-022-05654-4. Epub 2022 Jul 12.
9
Social Robots in Applied Settings: A Long-Term Study on Adaptive Robotic Tutors in Higher Education.应用场景中的社交机器人:关于高等教育中自适应机器人导师的长期研究。
Front Robot AI. 2022 Mar 15;9:831633. doi: 10.3389/frobt.2022.831633. eCollection 2022.
10
Aging, Gesture Production, and Disfluency in Speech: A Comparison of Younger and Older Adults.衰老、手势生成与言语不流畅:年轻与老年成年人的比较。
Cogn Sci. 2022 Feb;46(2):e13098. doi: 10.1111/cogs.13098.