• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于基于语音的认知评估的说话轮次感知语音分离

Speaker-turn aware diarization for speech-based cognitive assessments.

作者信息

Xu Sean Shensheng, Ke Xiaoquan, Mak Man-Wai, Wong Ka Ho, Meng Helen, Kwok Timothy C Y, Gu Jason, Zhang Jian, Tao Wei, Chang Chunqi

机构信息

School of Biomedical Engineering, Shenzhen University Medical School, Shenzhen University, Shenzhen, China.

Department of Electronic and Information Engineering, The Hong Kong Polytechnic University, Kowloon, Hong Kong SAR, China.

出版信息

Front Neurosci. 2024 Jan 16;17:1351848. doi: 10.3389/fnins.2023.1351848. eCollection 2023.

DOI:10.3389/fnins.2023.1351848
PMID:38292896
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10824834/
Abstract

INTRODUCTION

Speaker diarization is an essential preprocessing step for diagnosing cognitive impairments from speech-based Montreal cognitive assessments (MoCA).

METHODS

This paper proposes three enhancements to the conventional speaker diarization methods for such assessments. The enhancements tackle the challenges of diarizing MoCA recordings on two fronts. First, multi-scale channel interdependence speaker embedding is used as the front-end speaker representation for overcoming the acoustic mismatch caused by far-field microphones. Specifically, a squeeze-and-excitation (SE) unit and channel-dependent attention are added to Res2Net blocks for multi-scale feature aggregation. Second, a sequence comparison approach with a holistic view of the whole conversation is applied to measure the similarity of short speech segments in the conversation, which results in a speaker-turn aware scoring matrix for the subsequent clustering step. Third, to further enhance the diarization performance, we propose incorporating a pairwise similarity measure so that the speaker-turn aware scoring matrix contains both local and global information across the segments.

RESULTS

Evaluations on an interactive MoCA dataset show that the proposed enhancements lead to a diarization system that outperforms the conventional x-vector/PLDA systems under language-, age-, and microphone-mismatch scenarios.

DISCUSSION

The results also show that the proposed enhancements can help hypothesize the speaker-turn timestamps, making the diarization method amendable to datasets without timestamp information.

摘要

引言

说话人识别是基于语音的蒙特利尔认知评估(MoCA)诊断认知障碍的重要预处理步骤。

方法

本文针对此类评估对传统说话人识别方法提出了三项改进。这些改进从两个方面应对了对MoCA录音进行说话人识别的挑战。首先,多尺度通道相互依赖说话人嵌入被用作前端说话人表示,以克服由远场麦克风引起的声学不匹配。具体而言,在Res2Net块中添加了挤压激励(SE)单元和通道相关注意力,用于多尺度特征聚合。其次,应用一种具有整个对话整体视图的序列比较方法来测量对话中短语音段的相似性,从而为后续聚类步骤生成一个说话人轮次感知评分矩阵。第三,为了进一步提高说话人识别性能,我们建议纳入成对相似性度量,以便说话人轮次感知评分矩阵包含跨段的局部和全局信息。

结果

在一个交互式MoCA数据集上的评估表明,所提出的改进导致了一个说话人识别系统,在语言、年龄和麦克风不匹配的情况下,该系统优于传统的x向量/PLDA系统。

讨论

结果还表明,所提出的改进有助于推测说话人轮次时间戳,使说话人识别方法适用于没有时间戳信息的数据集。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b15a/10824834/dc01cf1f34c0/fnins-17-1351848-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b15a/10824834/f7b558a8930f/fnins-17-1351848-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b15a/10824834/4dc8a8e59489/fnins-17-1351848-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b15a/10824834/8a3e5c671d55/fnins-17-1351848-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b15a/10824834/de65eb2080a3/fnins-17-1351848-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b15a/10824834/011d26b5b14d/fnins-17-1351848-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b15a/10824834/a82ca04bc934/fnins-17-1351848-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b15a/10824834/60a2c3421d94/fnins-17-1351848-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b15a/10824834/dc01cf1f34c0/fnins-17-1351848-g0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b15a/10824834/f7b558a8930f/fnins-17-1351848-g0001.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b15a/10824834/4dc8a8e59489/fnins-17-1351848-g0002.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b15a/10824834/8a3e5c671d55/fnins-17-1351848-g0003.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b15a/10824834/de65eb2080a3/fnins-17-1351848-g0004.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b15a/10824834/011d26b5b14d/fnins-17-1351848-g0005.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b15a/10824834/a82ca04bc934/fnins-17-1351848-g0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b15a/10824834/60a2c3421d94/fnins-17-1351848-g0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/b15a/10824834/dc01cf1f34c0/fnins-17-1351848-g0008.jpg

相似文献

1
Speaker-turn aware diarization for speech-based cognitive assessments.用于基于语音的认知评估的说话轮次感知语音分离
Front Neurosci. 2024 Jan 16;17:1351848. doi: 10.3389/fnins.2023.1351848. eCollection 2023.
2
Real-time multilingual speech recognition and speaker diarization system based on Whisper segmentation.基于Whisper分割的实时多语言语音识别与说话人识别系统。
PeerJ Comput Sci. 2024 Mar 29;10:e1973. doi: 10.7717/peerj-cs.1973. eCollection 2024.
3
Multimodal Speaker Diarization Using a Pre-Trained Audio-Visual Synchronization Model.基于预训练的视听同步模型的多模态说话人分割。
Sensors (Basel). 2019 Nov 25;19(23):5163. doi: 10.3390/s19235163.
4
End-to-end neural speaker diarization with an iterative adaptive attractor estimation.基于迭代自适应吸引子估计的端到端神经说话人聚类
Neural Netw. 2023 Sep;166:566-578. doi: 10.1016/j.neunet.2023.07.043. Epub 2023 Aug 1.
5
Evaluation of Deep Clustering for Diarization of Aphasic Speech.用于失语症语音分离的深度聚类评估
Stud Health Technol Inform. 2019;260:81-88.
6
Multisensory Fusion for Unsupervised Spatiotemporal Speaker Diarization.用于无监督时空说话人分离的多感官融合
Sensors (Basel). 2024 Jun 29;24(13):4229. doi: 10.3390/s24134229.
7
Audio-Visual Speaker Diarization Based on Spatiotemporal Bayesian Fusion.基于时空贝叶斯融合的视听说话人定界
IEEE Trans Pattern Anal Mach Intell. 2018 May;40(5):1086-1099. doi: 10.1109/TPAMI.2017.2648793. Epub 2017 Jan 5.
8
Meta-learning with Latent Space Clustering in Generative Adversarial Network for Speaker Diarization.生成对抗网络中用于说话人聚类的潜在空间聚类元学习
IEEE/ACM Trans Audio Speech Lang Process. 2021;29:1204-1219. doi: 10.1109/taslp.2021.3061885. Epub 2021 Feb 26.
9
Development of Supervised Speaker Diarization System Based on the PyAnnote Audio Processing Library.基于 PyAnnote 音频处理库的监督式说话人标注系统的开发。
Sensors (Basel). 2023 Feb 13;23(4):2082. doi: 10.3390/s23042082.
10
The Impact of Speaker Diarization on DNN-based Autism Severity Estimation.说话人分段对基于 DNN 的自闭症严重程度估计的影响。
Annu Int Conf IEEE Eng Med Biol Soc. 2022 Jul;2022:3414-3417. doi: 10.1109/EMBC48229.2022.9871523.

本文引用的文献

1
Connected speech and language in mild cognitive impairment and Alzheimer's disease: A review of picture description tasks.轻度认知障碍和阿尔茨海默病中的连贯言语和语言:图片描述任务综述
J Clin Exp Neuropsychol. 2018 Nov;40(9):917-939. doi: 10.1080/13803395.2018.1446513. Epub 2018 Apr 19.
2
Use of Speech Analyses within a Mobile Application for the Assessment of Cognitive Impairment in Elderly People.在移动应用程序中使用语音分析评估老年人认知障碍
Curr Alzheimer Res. 2018;15(2):120-129. doi: 10.2174/1567205014666170829111942.
3
Progress on dementia-leaving no one behind.
痴呆症防治进展——不让任何人掉队。
Lancet. 2017 Dec 16;390(10113):e51-e53. doi: 10.1016/S0140-6736(17)31757-9. Epub 2017 Jul 20.
4
Is the Montreal Cognitive Assessment (MoCA) test better suited than the Mini-Mental State Examination (MMSE) in mild cognitive impairment (MCI) detection among people aged over 60? Meta-analysis.在60岁以上人群的轻度认知障碍(MCI)检测中,蒙特利尔认知评估量表(MoCA)测试是否比简易精神状态检查表(MMSE)更适用?荟萃分析。
Psychiatr Pol. 2016 Oct 31;50(5):1039-1052. doi: 10.12740/PP/45368.
5
Mini Mental State Examination and Logical Memory scores for entry into Alzheimer's disease trials.用于纳入阿尔茨海默病试验的简易精神状态检查表和逻辑记忆评分。
Alzheimers Res Ther. 2016 Feb 22;8:9. doi: 10.1186/s13195-016-0176-z.
6
Mini-Cog for the diagnosis of Alzheimer's disease dementia and other dementias within a community setting.用于在社区环境中诊断阿尔茨海默病性痴呆及其他痴呆症的简易认知功能测试。
Cochrane Database Syst Rev. 2015 Feb 3(2):CD010860. doi: 10.1002/14651858.CD010860.pub2.
7
Treatment for mild cognitive impairment: systematic review.轻度认知障碍的治疗:系统评价。
Br J Psychiatry. 2013 Sep;203(3):255-64. doi: 10.1192/bjp.bp.113.127811.
8
Risk of dementia in MCI: combined effect of cerebrovascular disease, volumetric MRI, and 1H MRS.轻度认知障碍患者患痴呆症的风险:脑血管疾病、容积磁共振成像和氢质子磁共振波谱的联合作用
Neurology. 2009 Apr 28;72(17):1519-25. doi: 10.1212/WNL.0b013e3181a2e864.
9
The Montreal Cognitive Assessment, MoCA: a brief screening tool for mild cognitive impairment.蒙特利尔认知评估量表(MoCA):一种用于轻度认知障碍的简易筛查工具。
J Am Geriatr Soc. 2005 Apr;53(4):695-9. doi: 10.1111/j.1532-5415.2005.53221.x.
10
The mini-cog: a cognitive 'vital signs' measure for dementia screening in multi-lingual elderly.简易认知筛查量表:一种用于多语言老年人痴呆筛查的认知“生命体征”测量方法。
Int J Geriatr Psychiatry. 2000 Nov;15(11):1021-7. doi: 10.1002/1099-1166(200011)15:11<1021::aid-gps234>3.0.co;2-6.