Gale Robert C, Fleegle Mikala, Fergadiotis Gerasimos, Bedrick Steven
Oregon Health and Science University, Portland, Oregon, USA.
Portland State University, Portland, Oregon, USA.
LREC Int Conf Lang Resour Eval. 2022 Jun;2022(RaPID4 Workshop):41-55.
We present the outcome of the Post-Stroke Speech Transcription (PSST) challenge. For the challenge, we prepared a new data resource of responses to two confrontation naming tests found in AphasiaBank, extracting audio and adding new phonemic transcripts for each response. The challenge consisted of two tasks. Task A asked challengers to build an automatic speech recognizer (ASR) for phonemic transcription of the PSST samples, evaluated in terms of phoneme error rate (PER) as well as a finer-grained metric derived from phonological feature theory, feature error rate (FER). The best model had a 9.9% FER / 20.0% PER, improving on our baseline by a relative 18% and 24%, respectively. Task B approximated a downstream assessment task, asking challengers to identify whether each recording contained a correctly pronounced target word. Challengers were unable to improve on the baseline algorithm; however, using this algorithm with the improved transcripts from Task A resulted in 92.8% accuracy / 0.921 F1, a relative improvement of 2.8% and 3.3%, respectively.
我们展示了中风后言语转录(PSST)挑战赛的结果。针对该挑战赛,我们准备了一个新的数据资源,它来自失语症库中两项对答命名测试的回答,提取了音频并为每个回答添加了新的音素转录文本。挑战赛包含两项任务。任务A要求参赛者构建一个自动语音识别器(ASR),用于对PSST样本进行音素转录,评估指标为音素错误率(PER)以及从音系特征理论得出的更细化的指标——特征错误率(FER)。最佳模型的FER为9.9%,PER为20.0%,相对于我们的基线分别提高了18%和24%。任务B近似于一个下游评估任务,要求参赛者识别每个录音中是否包含正确发音的目标单词。参赛者未能改进基线算法;然而,将该算法与任务A中改进后的转录文本结合使用,准确率达到了92.8%,F1值为0.921,相对于基线分别提高了2.8%和3.3%。