Suppr超能文献

面向临床医生的连续语音识别

Continuous speech recognition for clinicians.

作者信息

Zafar A, Overhage J M, McDonald C J

机构信息

Indiana University, Regenstrief Institute for Health Care, Indianapolis 46202-2859, USA.

出版信息

J Am Med Inform Assoc. 1999 May-Jun;6(3):195-204. doi: 10.1136/jamia.1999.0060195.

Abstract

The current generation of continuous speech recognition systems claims to offer high accuracy (greater than 95 percent) speech recognition at natural speech rates (150 words per minute) on low-cost (under $2000) platforms. This paper presents a state-of-the-technology summary, along with insights the authors have gained through testing one such product extensively and other products superficially. The authors have identified a number of issues that are important in managing accuracy and usability. First, for efficient recognition users must start with a dictionary containing the phonetic spellings of all words they anticipate using. The authors dictated 50 discharge summaries using one inexpensive internal medicine dictionary ($30) and found that they needed to add an additional 400 terms to get recognition rates of 98 percent. However, if they used either of two more expensive and extensive commercial medical vocabularies ($349 and $695), they did not need to add terms to get a 98 percent recognition rate. Second, users must speak clearly and continuously, distinctly pronouncing all syllables. Users must also correct errors as they occur, because accuracy improves with error correction by at least 5 percent over two weeks. Users may find it difficult to train the system to recognize certain terms, regardless of the amount of training, and appropriate substitutions must be created. For example, the authors had to substitute "twice a day" for "bid" when using the less expensive dictionary, but not when using the other two dictionaries. From trials they conducted in settings ranging from an emergency room to hospital wards and clinicians' offices, they learned that ambient noise has minimal effect. Finally, they found that a minimal "usable" hardware configuration (which keeps up with dictation) comprises a 300-MHz Pentium processor with 128 MB of RAM and a "speech quality" sound card (e.g., SoundBlaster, $99). Anything less powerful will result in the system lagging behind the speaking rate. The authors obtained 97 percent accuracy with just 30 minutes of training when using the latest edition of one of the speech recognition systems supplemented by a commercial medical dictionary. This technology has advanced considerably in recent years and is now a serious contender to replace some or all of the increasingly expensive alternative methods of dictation with human transcription.

摘要

当前一代的连续语音识别系统宣称能在低成本(低于2000美元)平台上,以自然语速(每分钟150个单词)实现高精度(超过95%)的语音识别。本文给出了一份技术现状总结,以及作者通过对一款此类产品进行广泛测试和对其他产品进行粗略测试所获得的见解。作者们确定了在管理准确性和可用性方面一些重要的问题。首先,为了实现高效识别,用户必须从一个包含他们预期会使用的所有单词的语音拼写的词典开始。作者使用一本便宜的内科词典(30美元)听写了50份出院小结,发现他们需要额外添加400个术语才能获得98%的识别率。然而,如果他们使用另外两本更昂贵且内容更丰富的商业医学词汇表(分别为349美元和695美元),则无需添加术语就能获得98%的识别率。其次,用户必须清晰且连续地说话,清晰地发出所有音节。用户还必须在错误出现时进行纠正,因为通过纠错,两周内准确率至少能提高5%。无论训练量如何,用户可能会发现难以训练系统识别某些术语,必须创建合适的替代词。例如,使用较便宜的词典时,作者不得不将“一天两次”替换为“bid”,而使用另外两本词典时则无需这样做。从他们在从急诊室到医院病房以及临床医生办公室等各种环境中进行的试验来看,他们了解到环境噪音的影响极小。最后,他们发现一个最低限度的“可用”硬件配置(能跟上听写速度)包括一台带有128兆字节随机存取存储器的300兆赫奔腾处理器和一块“语音质量”声卡(例如声霸卡,99美元)。任何性能更低的配置都会导致系统跟不上说话速度。作者在使用其中一款语音识别系统的最新版本并辅以一本商业医学词典时仅经过30分钟训练就获得了97%的准确率。近年来这项技术有了很大进展,现在它已成为一个有力的竞争者,有望取代部分或全部日益昂贵的人工转录听写替代方法。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验