• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于张量结构 EEG 信号的合成语音质量预测。

Quality prediction of synthesized speech based on tensor structured EEG signals.

机构信息

Graduate School of Information Sciences, Nara Institue of Science and Technology, Ikoma, Nara, Japan.

出版信息

PLoS One. 2018 Jun 14;13(6):e0193521. doi: 10.1371/journal.pone.0193521. eCollection 2018.

DOI:10.1371/journal.pone.0193521
PMID:29902169
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC6002021/
Abstract

This study investigates quality prediction methods for synthesized speech using EEG. Training a predictive model using EEG is challenging due to a small number of training trials, a low signal-to-noise ratio, and a high correlation among independent variables. When a predictive model is trained with a machine learning algorithm, the features extracted from multi-channel EEG signals are usually organized as a vector and their structures are ignored even though they are highly structured signals. This study predicts the subjective rating scores of synthesized speeches, including their overall impression, valence, and arousal, by creating tensor structured features instead of vectorized ones to exploit the structure of the features. We extracted various features to construct a tensor feature that maintained their structure. Vectorized and tensorial features were used to predict the rating scales, and the experimental result showed that prediction with tensorial features achieved the better predictive performance. Among the features, the alpha and beta bands are particularly more effective for predictions than other features, which agrees with previous neurophysiological studies.

摘要

本研究使用 EEG 研究合成语音的质量预测方法。由于训练试验次数少、信噪比低以及自变量之间存在高度相关性,使用 EEG 训练预测模型具有挑战性。当使用机器学习算法训练预测模型时,通常将从多通道 EEG 信号中提取的特征组织为向量,而忽略其结构,尽管它们是高度结构化的信号。本研究通过创建张量结构特征而不是矢量化特征来预测合成语音的主观评分,包括整体印象、情感和唤醒度,以利用特征的结构。我们提取了各种特征来构建保持其结构的张量特征。矢量化和张量特征用于预测评分尺度,实验结果表明,使用张量特征进行预测可获得更好的预测性能。在这些特征中,与其他特征相比,alpha 和 beta 波段对于预测更为有效,这与先前的神经生理学研究结果一致。

相似文献

1
Quality prediction of synthesized speech based on tensor structured EEG signals.基于张量结构 EEG 信号的合成语音质量预测。
PLoS One. 2018 Jun 14;13(6):e0193521. doi: 10.1371/journal.pone.0193521. eCollection 2018.
2
Predicting individual speech intelligibility from the cortical tracking of acoustic- and phonetic-level speech representations.从皮质追踪声学和语音水平的语音表示来预测个体言语可懂度。
Hear Res. 2019 Sep 1;380:1-9. doi: 10.1016/j.heares.2019.05.006. Epub 2019 May 28.
3
Data-driven spatial filtering for improved measurement of cortical tracking of multiple representations of speech.基于数据驱动的空间滤波提高了对语音多种表示形式的皮质跟踪的测量。
J Neural Eng. 2019 Oct 25;16(6):066017. doi: 10.1088/1741-2552/ab3c92.
4
Individual Classification of Single Trial EEG Traces to Discriminate Brain responses to Speech with Different Signal-to-Noise Ratios.基于单次试验脑电图轨迹的个体分类,以区分不同信噪比下大脑对语音的反应。
Annu Int Conf IEEE Eng Med Biol Soc. 2018 Jul;2018:987-990. doi: 10.1109/EMBC.2018.8512491.
5
Single-trial analysis of the neural correlates of speech quality perception.单次试验分析言语质量感知的神经相关因素。
J Neural Eng. 2013 Oct;10(5):056003. doi: 10.1088/1741-2560/10/5/056003. Epub 2013 Jul 31.
6
EEG oscillations entrain their phase to high-level features of speech sound.脑电图振荡使其相位与语音的高级特征同步。
Neuroimage. 2016 Jan 1;124(Pt A):16-23. doi: 10.1016/j.neuroimage.2015.08.054. Epub 2015 Sep 1.
7
Low- and high-frequency cortical brain oscillations reflect dissociable mechanisms of concurrent speech segregation in noise.低频和高频皮层脑振荡反映了噪声中并行语音分离的不同机制。
Hear Res. 2018 Apr;361:92-102. doi: 10.1016/j.heares.2018.01.006. Epub 2018 Feb 2.
8
Making predictable unpredictable with style - Behavioral and electrophysiological evidence for the critical role of prosodic expectations in the perception of prominence in speech.用风格使不可预测变得可预测——关于在言语感知中重音感知的韵律预期的关键作用的行为和电生理证据。
Neuropsychologia. 2018 Jan 31;109:181-199. doi: 10.1016/j.neuropsychologia.2017.12.011. Epub 2017 Dec 14.
9
Decoding the attended speech stream with multi-channel EEG: implications for online, daily-life applications.利用多通道脑电图解码所关注的语音流:对在线日常生活应用的启示。
J Neural Eng. 2015 Aug;12(4):046007. doi: 10.1088/1741-2560/12/4/046007. Epub 2015 Jun 2.
10
Generalizable EEG Encoding Models with Naturalistic Audiovisual Stimuli.具有自然视听刺激的可泛化 EEG 编码模型。
J Neurosci. 2021 Oct 27;41(43):8946-8962. doi: 10.1523/JNEUROSCI.2891-20.2021. Epub 2021 Sep 9.

本文引用的文献

1
AUTOMATIC MEASUREMENT OF AFFECTIVE VALENCE AND AROUSAL IN SPEECH.语音中情感效价和唤醒度的自动测量
Proc IEEE Int Conf Acoust Speech Signal Process. 2014 May;2014:965-969. doi: 10.1109/ICASSP.2014.6853740. Epub 2014 Jul 14.
2
Tensor Regression with Applications in Neuroimaging Data Analysis.张量回归及其在神经影像数据分析中的应用
J Am Stat Assoc. 2013;108(502):540-552. doi: 10.1080/01621459.2013.776499.
3
Classifying different emotional states by means of EEG-based functional connectivity patterns.通过基于脑电图的功能连接模式对不同情绪状态进行分类。
PLoS One. 2014 Apr 17;9(4):e95415. doi: 10.1371/journal.pone.0095415. eCollection 2014.
4
A review of feature reduction techniques in neuroimaging.神经影像学中特征降维技术的综述。
Neuroinformatics. 2014 Apr;12(2):229-44. doi: 10.1007/s12021-013-9204-3.
5
Higher order partial least squares (HOPLS): a generalized multilinear regression method.高阶偏最小二乘法(HOPLS):一种广义的多线性回归方法。
IEEE Trans Pattern Anal Mach Intell. 2013 Jul;35(7):1660-73. doi: 10.1109/TPAMI.2012.254.
6
ADJUST: An automatic EEG artifact detector based on the joint use of spatial and temporal features.ADJUST:一种基于空间和时间特征联合使用的自动脑电图伪迹检测器。
Psychophysiology. 2011 Feb;48(2):229-40. doi: 10.1111/j.1469-8986.2010.01061.x.
7
EEG-based emotion recognition in music listening.基于脑电的音乐聆听中的情绪识别。
IEEE Trans Biomed Eng. 2010 Jul;57(7):1798-806. doi: 10.1109/TBME.2010.2048568. Epub 2010 May 3.
8
A comparison of dimensional models of emotion: evidence from emotions, prototypical events, autobiographical memories, and words.情绪维度模型的比较:来自情绪、典型事件、自传体记忆和词语的证据。
Memory. 2009 Nov;17(8):802-8. doi: 10.1080/09658210903130764. Epub 2009 Aug 18.
9
Lateralisation effect in comprehension of emotional facial expression: a comparison between EEG alpha band power and behavioural inhibition (BIS) and activation (BAS) systems.情绪面部表情理解中的侧化效应:脑电图α波段功率与行为抑制(BIS)和激活(BAS)系统的比较。
Laterality. 2010 May;15(3):361-84. doi: 10.1080/13576500902886056. Epub 2009 Jun 17.
10
A spectralanalytic approach to emotional responses evoked through picture presentation.一种通过图片呈现引发情绪反应的频谱分析方法。
Int J Psychophysiol. 2009 May;72(2):212-6. doi: 10.1016/j.ijpsycho.2008.12.009. Epub 2008 Dec 24.