Suppr超能文献

让对打印文稿进行人工评分成为过去式:对赫尔曼(2025年)的评论

Making manual scoring of typed transcripts a thing of the past: a commentary on Herrmann (2025).

作者信息

Bosker Hans Rutger

机构信息

Donders Institute for Brain, Cognition and Behaviour, Radboud University, Nijmegen, the Netherlands.

出版信息

Speech Lang Hear. 2025 Jun 9;28(1):2514395. doi: 10.1080/2050571X.2025.2514395. eCollection 2025.

Abstract

Coding the accuracy of typed transcripts from experiments testing speech intelligibility is an arduous endeavour. A recent study in this journal [Herrmann, B. 2025. Leveraging natural language processing models to automate speech-intelligibility scoring. (1)] presents a novel approach for automating the scoring of such listener transcripts, leveraging Natural Language Processing (NLP) models. It involves the calculation of the semantic similarity between transcripts and target sentences using high-dimensional vectors, generated by such NLP models as ADA2, GPT2, BERT, and USE. This approach demonstrates exceptional accuracy, with negligible underestimation of intelligibility scores (by about 2-4%), numerically outperforming simpler computational tools like Autoscore and TSR. The method uniquely relies on semantic representations generated by large language models. At the same time, these models also form the Achilles heel of the technique: the transparency, accessibility, data security, ethical framework, and cost of the selected model directly impact the suitability of the NLP-based scoring method. Hence, working with such models can raise serious risks regarding the reproducibility of scientific findings. This in turn emphasises the need for fair, ethical, and evidence-based open source models. With such models, Herrmann's new tool represents a valuable addition to the speech scientist's toolbox.

摘要

对测试语音清晰度的实验中的打字记录准确性进行编码是一项艰巨的工作。本期刊最近的一项研究[赫尔曼,B. 2025年。利用自然语言处理模型实现语音清晰度评分自动化。(1)]提出了一种新颖的方法,利用自然语言处理(NLP)模型实现对此类听众记录的评分自动化。它涉及使用由ADA2、GPT2、BERT和USE等NLP模型生成的高维向量来计算记录与目标句子之间的语义相似度。这种方法显示出极高的准确性,对清晰度分数的低估可以忽略不计(约2 - 4%),在数值上优于Autoscore和TSR等更简单的计算工具。该方法独特地依赖于大语言模型生成的语义表示。与此同时,这些模型也构成了该技术的致命弱点:所选模型的透明度、可访问性、数据安全性、道德框架和成本直接影响基于NLP的评分方法的适用性。因此,使用此类模型可能会给科学发现的可重复性带来严重风险。这反过来强调了对公平、道德且基于证据的开源模型的需求。有了这样的模型,赫尔曼的新工具成为语音科学家工具箱中的一项宝贵补充。

相似文献

10
Automated Scoring of the Speech Intelligibility Test Using Autoscore.使用自动评分系统对言语清晰度测试进行自动评分。
Am J Speech Lang Pathol. 2025 Jul 29;34(4S):2397-2408. doi: 10.1044/2024_AJSLP-24-00276. Epub 2024 Dec 12.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验