• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

使用梅尔频率倒谱系数(MFCC)和卷积神经网络计算鼻漏气率。

Computing nasalance with MFCCs and Convolutional Neural Networks.

作者信息

Lozano Andrés, Nava Enrique, García Méndez María Dolores, Moreno-Torres Ignacio

机构信息

Department of Communication Engineering, University of Málaga, Málaga, Spain.

Department of Spanish Philology, University of Málaga, Málaga, Spain.

出版信息

PLoS One. 2024 Dec 31;19(12):e0315452. doi: 10.1371/journal.pone.0315452. eCollection 2024.

DOI:10.1371/journal.pone.0315452
PMID:39739659
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11687758/
Abstract

Nasalance is a valuable clinical biomarker for hypernasality. It is computed as the ratio of acoustic energy emitted through the nose to the total energy emitted through the mouth and nose (eNasalance). A new approach is proposed to compute nasalance using Convolutional Neural Networks (CNNs) trained with Mel-Frequency Cepstrum Coefficients (mfccNasalance). mfccNasalance is evaluated by examining its accuracy: 1) when the train and test data are from the same or from different dialects; 2) with test data that differs in dynamicity (e.g. rapidly produced diadochokinetic syllables versus short words); and 3) using multiple CNN configurations (i.e. kernel shape and use of 1 × 1 pointwise convolution). Dual-channel Nasometer speech data from healthy speakers from different dialects: Costa Rica, more(+) nasal, Spain and Chile, less(-) nasal, are recorded. The input to the CNN models were sequences of 39 MFCC vectors computed from 250 ms moving windows. The test data were recorded in Spain and included short words (-dynamic), sentences (+dynamic), and diadochokinetic syllables (+dynamic). The accuracy of a CNN model was defined as the Spearman correlation between the mfccNasalance for that model and the perceptual nasality scores of human experts. In the same-dialect condition, mfccNasalance was more accurate than eNasalance independently of the CNN configuration; using a 1 × 1 kernel resulted in increased accuracy for +dynamic utterances (p < .000), though not for -dynamic utterances. The kernel shape had a significant impact for -dynamic utterances (p < .000) exclusively. In the different-dialect condition, the scores were significantly less accurate than in the same-dialect condition, particularly for Costa Rica trained models. We conclude that mfccNasalance is a flexible and useful alternative to eNasalance. Future studies should explore how to optimize mfccNasalance by selecting the most adequate CNN model as a function of the dynamicity of the target speech data.

摘要

鼻声度是评估鼻音过重的一项重要临床生物标志物。它通过计算经鼻腔发出的声能与经口腔和鼻腔发出的总声能之比得出(即电子鼻声度)。本文提出了一种新方法,利用基于梅尔频率倒谱系数训练的卷积神经网络(CNN)来计算鼻声度(mfcc鼻声度)。通过以下方式评估mfcc鼻声度的准确性:1)训练数据和测试数据来自相同方言或不同方言时;2)测试数据在动态性方面有所不同时(例如快速发出的重复音节与简短词汇);3)使用多种CNN配置时(即内核形状和1×1逐点卷积的使用)。记录了来自不同方言地区健康受试者的双通道鼻声计语音数据:来自鼻化程度较高的哥斯达黎加、鼻化程度较低的西班牙和智利。CNN模型的输入是从250毫秒移动窗口计算得出的39个MFCC向量序列。测试数据在西班牙录制,包括简短词汇(动态性低)、句子(动态性高)和重复音节(动态性高)。CNN模型的准确性定义为该模型的mfcc鼻声度与人类专家的感知鼻音评分之间的斯皮尔曼相关性。在相同方言条件下,无论CNN配置如何,mfcc鼻声度都比电子鼻声度更准确;使用1×1内核可提高动态性高的语音的准确性(p < .000),但对动态性低的语音无效。内核形状仅对动态性低的语音有显著影响(p < .000)。在不同方言条件下,得分的准确性明显低于相同方言条件,尤其是对于在哥斯达黎加训练的模型。我们得出结论,mfcc鼻声度是电子鼻声度的一种灵活且有用的替代方法。未来的研究应探索如何根据目标语音数据的动态性选择最合适的CNN模型,以优化mfcc鼻声度。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f0c/11687758/fa6f4d4c9a63/pone.0315452.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f0c/11687758/3e0ff2a26f3e/pone.0315452.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f0c/11687758/95ce64364ca5/pone.0315452.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f0c/11687758/d5926bc65f8b/pone.0315452.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f0c/11687758/dbbf4f393040/pone.0315452.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f0c/11687758/379a87fe35a6/pone.0315452.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f0c/11687758/461c4e3b20af/pone.0315452.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f0c/11687758/fa6f4d4c9a63/pone.0315452.g007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f0c/11687758/3e0ff2a26f3e/pone.0315452.g001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f0c/11687758/95ce64364ca5/pone.0315452.g002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f0c/11687758/d5926bc65f8b/pone.0315452.g003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f0c/11687758/dbbf4f393040/pone.0315452.g004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f0c/11687758/379a87fe35a6/pone.0315452.g005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f0c/11687758/461c4e3b20af/pone.0315452.g006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/1f0c/11687758/fa6f4d4c9a63/pone.0315452.g007.jpg

相似文献

1
Computing nasalance with MFCCs and Convolutional Neural Networks.使用梅尔频率倒谱系数(MFCC)和卷积神经网络计算鼻漏气率。
PLoS One. 2024 Dec 31;19(12):e0315452. doi: 10.1371/journal.pone.0315452. eCollection 2024.
2
Dialectical effects on nasalance: a multicenter, cross-continental study.
J Speech Lang Hear Res. 2015 Feb;58(1):69-77. doi: 10.1044/2014_JSLHR-S-14-0077.
3
Nasalance scores for typical Irish English-speaking adults.典型爱尔兰英语成年使用者的鼻漏气率得分。
Logoped Phoniatr Vocol. 2013 Dec;38(4):167-72. doi: 10.3109/14015439.2012.679965. Epub 2012 May 14.
4
Nasometric values for normal nasal resonance in the speech of young Flemish adults.弗拉芒年轻成年人言语中正常鼻共鸣的鼻测量值。
Cleft Palate Craniofac J. 2001 Mar;38(2):112-8. doi: 10.1597/1545-1569_2001_038_0112_nvfnnr_2.0.co_2.
5
Nasalance in Arabic-Speaking Jordanians: A Comparative Study.阿拉伯语母语的约旦人鼻音:一项比较研究。
Folia Phoniatr Logop. 2020;72(5):370-377. doi: 10.1159/000502171. Epub 2019 Sep 10.
6
Nasalance Scores for Normal Speakers of American English Obtained by the Nasometer II Using the MacKay-Kummer SNAP-R Test.采用麦凯-库默 SNAP-R 测试的 Nasometer II 对美国英语标准发音者的鼻音测量值。
Cleft Palate Craniofac J. 2022 Jun;59(6):765-773. doi: 10.1177/10556656211025406. Epub 2021 Jun 29.
7
Comparison of nasal acceleration and nasalance across vowels.鼻加速与元音鼻音的比较。
J Speech Lang Hear Res. 2013 Oct;56(5):1476-84. doi: 10.1044/1092-4388(2013/12-0239). Epub 2013 Jul 9.
8
Normative nasalance scores for Estonian children.爱沙尼亚儿童的标准鼻声计评分
Clin Linguist Phon. 2018;32(11):1054-1066. doi: 10.1080/02699206.2018.1495767. Epub 2018 Jul 9.
9
Voice low tone to high tone ratio, nasalance, and nasality ratings in connected speech of native Mandarin speakers: a pilot study.普通话母语者连贯言语中的音高从低音到高音的比例、鼻漏气及鼻音评级:一项初步研究
Cleft Palate Craniofac J. 2012 Jul;49(4):437-46. doi: 10.1597/10-183. Epub 2011 Jul 8.
10
Normative nasalance scores in the production of words and syllables for Brazilian Portuguese speakers.巴西葡萄牙语使用者在单词和音节发音中的标准鼻化率得分。
Clin Linguist Phon. 2019;33(12):1139-1148. doi: 10.1080/02699206.2019.1590733. Epub 2019 Mar 20.

本文引用的文献

1
Speaker-independent speech inversion for recovery of velopharyngeal port constriction degreea).a). 用于恢复软腭口裂狭窄程度的非特定说话人语音反转。
J Acoust Soc Am. 2024 Aug 1;156(2):1380-1390. doi: 10.1121/10.0028124.
2
The Correlation Between Perceptual Ratings and Nasalance Scores in Resonance Disorders: A Systematic Review.共鸣障碍的感知评分与鼻音计评分之间的相关性:系统评价。
J Speech Lang Hear Res. 2022 Jun 8;65(6):2215-2234. doi: 10.1044/2022_JSLHR-21-00588. Epub 2022 May 2.
3
Current Applications of Artificial Intelligence in Cleft Care: A Scoping Review.
人工智能在腭裂治疗中的当前应用:一项范围综述
Front Med (Lausanne). 2021 Jul 28;8:676490. doi: 10.3389/fmed.2021.676490. eCollection 2021.
4
A practical method of estimating the time-varying degree of vowel nasalization from acoustic features.一种基于声学特征估计元音鼻化程度时变特性的实用方法。
J Acoust Soc Am. 2021 Feb;149(2):911. doi: 10.1121/10.0002925.
5
A Deep Learning Algorithm for Objective Assessment of Hypernasality in Children With Cleft Palate.一种用于客观评估腭裂儿童过度鼻音的深度学习算法。
IEEE Trans Biomed Eng. 2021 Oct;68(10):2986-2996. doi: 10.1109/TBME.2021.3058424. Epub 2021 Sep 20.
6
HypernasalityNet: Deep recurrent neural network for automatic hypernasality detection.HypernasalityNet:用于自动检测超鼻音的深度递归神经网络。
Int J Med Inform. 2019 Sep;129:1-12. doi: 10.1016/j.ijmedinf.2019.05.023. Epub 2019 May 23.
7
Automated speech analysis tools for children's speech production: A systematic literature review.用于儿童言语产生的自动语音分析工具:一项系统的文献综述。
Int J Speech Lang Pathol. 2018 Nov;20(6):583-598. doi: 10.1080/17549507.2018.1477991. Epub 2018 Jul 11.
8
Using ultrasound and nasalance to separate oral and nasal contributions to formant frequencies of nasalized vowels.使用超声和鼻共鸣测量分离元音共振峰频率中的口腔和鼻腔贡献。
J Acoust Soc Am. 2018 May;143(5):2588. doi: 10.1121/1.5034760.
9
Validity of test stimuli for nasalance measurement in speakers of Jordanian Arabic.约旦阿拉伯语使用者鼻音测量测试刺激的有效性
Logoped Phoniatr Vocol. 2018 Oct;43(3):93-100. doi: 10.1080/14015439.2017.1370724. Epub 2017 Sep 7.
10
Evaluation of Speech and Resonance for Children with Craniofacial Anomalies.颅面畸形儿童的言语和共鸣评估
Facial Plast Surg Clin North Am. 2016 Nov;24(4):445-451. doi: 10.1016/j.fsc.2016.06.003.