Shiffman S, Detmer W M, Lane C D, Fagan L M
Section on Medical Informatics, Stanford University, CA 94305-5479.
J Am Med Inform Assoc. 1995 Jan-Feb;2(1):36-45. doi: 10.1136/jamia.1995.95202546.
Develop a continuous-speech interface that allows flexible input of clinical findings into a medical diagnostic application.
The authors' program allows users to enter clinical findings using their own vernacular. It displays from the diagnostic program's controlled vocabulary a list of terms that most closely matches the input, and allows the user to select the single best term. The interface program includes two components: a speech-recognition component that converts utterances into text strings, and a language-processing component that matches recognized text strings with controlled-vocabulary terms. The speech-recognition component is composed of commercially available speech-recognition hardware and software, and developer-created grammars, which specify the language to be recognized. The language-processing component is composed of a translator, which extracts a canonical form from both recognized text strings and controlled-vocabulary terms, and a matcher, which measures the similarity between the two canonical forms.
The authors discovered that grammars constructed by a physician, who could anticipate how users might speak findings, supported speech recognition better than did grammars constructed programmatically from the controlled vocabulary. However, this programmatic method of grammar construction was more time efficient and better supported long-term maintenance of the grammars. The authors also found that language-processing techniques recovered some of the information lost due to speech misrecognition, but were dependent on the completeness of supporting synonym dictionaries.
The authors' program demonstrated the feasibility of using continuous speech to enter findings into a medical application. However, improvements in speech-recognition technology and language-processing techniques are needed before natural continuous speech becomes an acceptable input modality for clinical applications.
开发一种连续语音接口,以便能将临床检查结果灵活输入到医学诊断应用程序中。
作者的程序允许用户使用自己的语言输入临床检查结果。它从诊断程序的控制词汇表中显示与输入最匹配的术语列表,并允许用户选择最佳的单个术语。该接口程序包括两个组件:一个将话语转换为文本字符串的语音识别组件,以及一个将识别出的文本字符串与控制词汇表术语进行匹配的语言处理组件。语音识别组件由市售的语音识别硬件和软件以及开发者创建的语法组成,这些语法指定了要识别的语言。语言处理组件由一个翻译器和一个匹配器组成,翻译器从识别出的文本字符串和控制词汇表术语中提取规范形式,匹配器测量这两种规范形式之间的相似度。
作者发现,由能够预测用户可能如何表述检查结果的医生构建的语法,比根据控制词汇表通过编程构建的语法更能支持语音识别。然而,这种语法构建的编程方法效率更高,并且更有利于语法的长期维护。作者还发现,语言处理技术弥补了一些因语音误识别而丢失的信息,但这取决于支持同义词词典的完整性。
作者的程序证明了使用连续语音将检查结果输入医学应用程序的可行性。然而,在自然连续语音成为临床应用可接受的输入方式之前,语音识别技术和语言处理技术仍需改进。