Pakhomov Serguei V S, Marino Susan E, Banks Sarah, Bernick Charles
Center for Clinical and Cognitive Neuropharmacology, University of Minnesota, Minneapolis, USA.
Lou Ruvo Center for Brain Health, Cleveland Clinic.
Speech Commun. 2015 Dec 1;75:14-26. doi: 10.1016/j.specom.2015.09.010. Epub 2015 Sep 28.
Cognitive tests of verbal fluency (VF) consist of verbalizing as many words as possible in one minute that either start with a specific letter of the alphabet or belong to a specific semantic category. These tests are widely used in neurological, psychiatric, mental health, and school settings and their validity for clinical applications has been extensively demonstrated. However, VF tests are currently administered and scored manually making them too cumbersome to use, particularly for longitudinal cognitive monitoring in large populations. The objective of the current study was to determine if automatic speech recognition (ASR) could be used for computerized administration and scoring of VF tests. We examined established techniques for constraining language modeling to a predefined vocabulary from a specific semantic category (e.g., animals). We also experimented with post-processing ASR output with confidence scoring, as well as with using speaker adaptation to improve automated VF scoring. Audio responses to a VF task were collected from 38 novice and experienced professional fighters (boxing and mixed martial arts) participating in a longitudinal study of effects of repetitive head trauma on brain function. Word error rate, correlation with manual word count and distance from manual word count were used to compare ASR-based approaches to scoring to each other and to the manually scored reference standard. Our study's results show that responses to the VF task contain a large number of extraneous utterances and noise that lead to relatively poor baseline ASR performance. However, we also found that speaker adaptation combined with confidence scoring significantly improves all three metrics and can enable use of ASR for reliable estimates of the traditional manual VF scores.
言语流畅性(VF)的认知测试包括在一分钟内尽可能多地说出以字母表中特定字母开头或属于特定语义类别的单词。这些测试广泛应用于神经科、精神科、心理健康和学校环境中,其在临床应用中的有效性已得到广泛证实。然而,VF测试目前是手动进行管理和评分的,这使得它们使用起来过于繁琐,特别是对于大规模人群的纵向认知监测。本研究的目的是确定自动语音识别(ASR)是否可用于VF测试的计算机化管理和评分。我们研究了将语言建模限制在来自特定语义类别(如动物)的预定义词汇表的既定技术。我们还尝试了使用置信度评分对ASR输出进行后处理,以及使用说话人自适应来改进自动VF评分。从38名参与重复性头部创伤对脑功能影响纵向研究的新手和经验丰富的职业拳击手(拳击和综合格斗)那里收集了对VF任务的音频响应。使用单词错误率、与手动单词计数的相关性以及与手动单词计数的距离,将基于ASR的评分方法相互比较,并与手动评分的参考标准进行比较。我们研究的结果表明,对VF任务的响应包含大量无关话语和噪声,导致基线ASR性能相对较差。然而,我们还发现,说话人自适应与置信度评分相结合显著改善了所有三个指标,并能够使用ASR可靠估计传统的手动VF分数。