Suppr超能文献

评估在普通病历语料库中进行大规模自然语言处理的可行性:词汇分析

Assessing the feasibility of large-scale natural language processing in a corpus of ordinary medical records: a lexical analysis.

作者信息

Hersh W R, Campbell E M, Malveau S E

机构信息

Division of Medical Informatics and Outcomes Research, Oregon Health Sciences University, USA.

出版信息

Proc AMIA Annu Fall Symp. 1997:580-4.

Abstract

OBJECTIVE

Identify the lexical content of a large corpus of ordinary medical records to assess the feasibility of large-scale natural language processing.

METHODS

A corpus of 560 megabytes of medical record text from an academic medical center was broken into individual words and compared with the words in six medical vocabularies, a common word list, and a database of patient names. Unrecognized words were assessed for algorithmic and contextual approaches to identifying more words, while the remainder were analyzed for spelling correctness.

RESULTS

About 60% of the words occurred in the medical vocabularies, common word list, or names database. Of the remainder, one-third were recognizable by other means. Of the remaining unrecognizable words, over three-fourths represented correctly spelled real words and the rest were misspellings.

CONCLUSIONS

Large-scale generalized natural language processing methods for the medical record will require expansion of existing vocabularies, spelling error correction, and other algorithmic approaches to map words into those from clinical vocabularies.

摘要

目的

识别大量普通病历的词汇内容,以评估大规模自然语言处理的可行性。

方法

将来自一所学术医疗中心的560兆字节病历文本语料库拆分为单个单词,并与六个医学词汇表、一个常用单词列表和一个患者姓名数据库中的单词进行比较。对未识别的单词评估用于识别更多单词的算法和上下文方法,而其余单词则分析其拼写正确性。

结果

约60%的单词出现在医学词汇表、常用单词列表或姓名数据库中。其余单词中,三分之一可通过其他方式识别。在其余无法识别的单词中,超过四分之三代表拼写正确的真实单词,其余为拼写错误。

结论

用于病历的大规模通用自然语言处理方法将需要扩展现有词汇表、校正拼写错误以及采用其他算法方法将单词映射到临床词汇表中的单词。

相似文献

6
A semantic lexicon for medical language processing.用于医学语言处理的语义词典。
J Am Med Inform Assoc. 1999 May-Jun;6(3):205-18. doi: 10.1136/jamia.1999.0060205.

引用本文的文献

7
FlexiTerm: a flexible term recognition method.FlexiTerm:一种灵活的术语识别方法。
J Biomed Semantics. 2013 Oct 10;4(1):27. doi: 10.1186/2041-1480-4-27.

本文引用的文献

4
The Unified Medical Language System.统一医学语言系统
Methods Inf Med. 1993 Aug;32(4):281-91. doi: 10.1055/s-0038-1634945.
6
Natural language processing and the representation of clinical data.自然语言处理与临床数据的表示
J Am Med Inform Assoc. 1994 Mar-Apr;1(2):142-60. doi: 10.1136/jamia.1994.95236145.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验