用于分类认知风险因素的分段和超分段语音基础模型：评估开箱即用性能

Segmental and Suprasegmental Speech Foundation Models for Classifying Cognitive Risk Factors: Evaluating Out-of-the-Box Performance.

作者信息

Ng Si-Ioi, Xu Lingfeng, Mueller Kimberly D, Liss Julie, Berisha Visar

机构信息

Arizona State University, USA.

University of Wisconsin-Madison, USA.

出版信息

Interspeech. 2024 Sep;2024:917-921. doi: 10.21437/interspeech.2024-2063.

DOI:10.21437/interspeech.2024-2063

PMID:40051645

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11884505/

Abstract

Speech foundation models are remarkably successful in various consumer applications, prompting their extension to clinical use-cases. This is challenged by small clinical datasets, which precludes effective fine-tuning. We tested the efficacy of two models to classify participants by segmental (Wav2Vec2.0) and suprasegmental (Trillsson) speech analysis windows. Analysis at both time scales has shown differences in the context of cognitive decline. Speakers were classified as healthy controls (HC), Amyloid-β+ (Aβ+), mild cognitive impairment (MCI), or dementia. A subset of W2V2 and Trillsson representations showed large effect size between HC and each risk factor. Cross-validation showed W2V2 consistently outperforms Trillsson. Mean macro-F1 of 54.1%, 63.5%, and 72.0% in were found for classifying Aβ+, MCI, and dementia from HC. Repeatability of Trillsson and W2V2 showed intraclass correlations of 0.30 and 0.41. Reliability of such models must be enhanced for clinical speech analysis and longitudinal tracking.

摘要

语音基础模型在各种消费应用中取得了显著成功，促使其扩展到临床用例。然而，小型临床数据集对此构成了挑战，因为这使得有效的微调无法实现。我们测试了两种模型通过分段（Wav2Vec2.0）和超分段（Trillsson）语音分析窗口对参与者进行分类的效果。在这两个时间尺度上的分析都显示了认知衰退背景下的差异。说话者被分类为健康对照（HC）、淀粉样β蛋白阳性（Aβ+）、轻度认知障碍（MCI）或痴呆。W2V2和Trillsson表示的一个子集在HC与每个风险因素之间显示出较大的效应量。交叉验证表明W2V2始终优于Trillsson。从HC中分类Aβ+、MCI和痴呆时，平均宏F1分别为54.1%、63.5%和72.0%。Trillsson和W2V2的可重复性显示组内相关系数分别为0.30和0.41。对于临床语音分析和纵向跟踪，此类模型的可靠性必须提高。

Suppr 超能文献

文献检索

文件翻译

深度研究

Suppr 超能文献

文献检索

文件翻译

深度研究

用于分类认知风险因素的分段和超分段语音基础模型：评估开箱即用性能

Segmental and Suprasegmental Speech Foundation Models for Classifying Cognitive Risk Factors: Evaluating Out-of-the-Box Performance.

作者信息

机构信息

出版信息

相似文献

本文引用的文献

用于分类认知风险因素的分段和超分段语音基础模型：评估开箱即用性能

Segmental and Suprasegmental Speech Foundation Models for Classifying Cognitive Risk Factors: Evaluating Out-of-the-Box Performance.

作者信息

机构信息

出版信息

相似文献

本文引用的文献