Suppr超能文献

语音病理学对自动说话人验证的影响:一项大规模研究。

The effect of speech pathology on automatic speaker verification: a large-scale study.

机构信息

Pattern Recognition Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91058, Erlangen, Germany.

Speech & Language Processing Lab, Friedrich-Alexander-Universität Erlangen-Nürnberg, 91054, Erlangen, Germany.

出版信息

Sci Rep. 2023 Nov 22;13(1):20476. doi: 10.1038/s41598-023-47711-7.

Abstract

Navigating the challenges of data-driven speech processing, one of the primary hurdles is accessing reliable pathological speech data. While public datasets appear to offer solutions, they come with inherent risks of potential unintended exposure of patient health information via re-identification attacks. Using a comprehensive real-world pathological speech corpus, with over n[Formula: see text]3800 test subjects spanning various age groups and speech disorders, we employed a deep-learning-driven automatic speaker verification (ASV) approach. This resulted in a notable mean equal error rate (EER) of [Formula: see text], outstripping traditional benchmarks. Our comprehensive assessments demonstrate that pathological speech overall faces heightened privacy breach risks compared to healthy speech. Specifically, adults with dysphonia are at heightened re-identification risks, whereas conditions like dysarthria yield results comparable to those of healthy speakers. Crucially, speech intelligibility does not influence the ASV system's performance metrics. In pediatric cases, particularly those with cleft lip and palate, the recording environment plays a decisive role in re-identification. Merging data across pathological types led to a marked EER decrease, suggesting the potential benefits of pathological diversity in ASV, accompanied by a logarithmic boost in ASV effectiveness. In essence, this research sheds light on the dynamics between pathological speech and speaker verification, emphasizing its crucial role in safeguarding patient confidentiality in our increasingly digitized healthcare era.

摘要

在处理数据驱动的语音处理的挑战中,一个主要的障碍是获取可靠的病理性语音数据。虽然公共数据集似乎提供了解决方案,但它们存在通过重新识别攻击潜在地暴露患者健康信息的固有风险。我们使用了一个全面的真实世界病理性语音语料库,其中包含超过 n[公式:见文本]3800 名来自不同年龄组和语音障碍的测试对象,采用了深度学习驱动的自动说话人验证 (ASV) 方法。这导致了一个显著的平均等错误率 (EER) [公式:见文本],超过了传统基准。我们的全面评估表明,与健康语音相比,病理性语音总体上面临更高的隐私泄露风险。具体来说,患有发音障碍的成年人面临更高的重新识别风险,而像口吃这样的情况则产生与健康说话者相当的结果。至关重要的是,语音可懂度不会影响 ASV 系统的性能指标。在儿科病例中,特别是唇腭裂患者,录音环境在重新识别中起着决定性的作用。合并病理性语音数据类型会导致 EER 明显下降,这表明病理性语音多样性在 ASV 中的潜在益处,同时对数级提高 ASV 有效性。本质上,这项研究揭示了病理性语音与说话人验证之间的动态关系,强调了在我们日益数字化的医疗保健时代保护患者隐私的重要作用。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/32df/10665418/067ca0f13400/41598_2023_47711_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验