Suppr超能文献

药物词汇表对从电子病历中提取用药信息的效果研究。

Study of effect of drug lexicons on medication extraction from electronic medical records.

作者信息

Sirohi E, Peissig P

机构信息

Marshfield Clinic Research Foundation, Marshfield, WI 54449, USA.

出版信息

Pac Symp Biocomput. 2005:308-18. doi: 10.1142/9789812702456_0029.

Abstract

Extraction of relevant information from free-text clinical notes is becoming increasingly important in healthcare to provide personalized care to patients. The purpose of this dictionary-based NLP study was to determine the effects of using varying drug lexicons to automatically extract medication information from electronic medical records. A convenience training sample of 52 documents, each containing at least one medication, and a randomized test sample of 100 documents were used in this study. The training and test set documents contained a total of 681 and 641 medications respectively. Three sets of drug lexicons were used as sources for medication extraction: first, containing drug name and generic name; second with drug, generic and short names; third with drug, generic and short names followed by filtering techniques. Extraction with the first drug lexicon resulted in 83.7% sensitivity and 96.2% specificity for the training set and 85.2% sensitivity and 96.9% specificity for the test set. Adding the list of short names used for drugs resulted in increasing sensitivity to 95.0%, but decreased the specificity to 79.2% for the training set. Similar results of increased sensitivity of 96.4% and 80.1% specificity were obtained for the test set. Combination of a set of filtering techniques with data from the second lexicon increased the specificity to 98.5% and 98.8% for the training and test sets respectively while slightly decreasing the sensitivity to 94.1% (training) and 95.8% (test). Overall, the lexicon with filtering resulted in the highest precision, i.e., extracted the highest number of medications while keeping the number of extracted non-medications low.

摘要

从自由文本临床记录中提取相关信息在医疗保健领域对于为患者提供个性化护理变得越来越重要。这项基于词典的自然语言处理研究的目的是确定使用不同的药物词汇表从电子病历中自动提取用药信息的效果。本研究使用了一个由52份文档组成的便利训练样本,每份文档至少包含一种药物,以及一个由100份文档组成的随机测试样本。训练集和测试集文档分别总共包含681种和641种药物。三组药物词汇表被用作提取用药信息的来源:第一组包含药物名称和通用名称;第二组包含药物、通用名称和简称;第三组包含药物、通用名称和简称,随后采用过滤技术。使用第一组药物词汇表进行提取时,训练集的灵敏度为83.7%,特异性为96.2%,测试集的灵敏度为85.2%,特异性为96.9%。添加药物简称列表后,训练集的灵敏度提高到95.0%,但特异性降至79.2%。测试集也得到了类似的结果,灵敏度提高到96.4%,特异性为80.1%。将一组过滤技术与第二组词汇表的数据相结合,训练集和测试集的特异性分别提高到98.5%和98.8%,而灵敏度略有下降,分别为94.1%(训练集)和95.8%(测试集)。总体而言,带有过滤功能的词汇表具有最高的精度,即提取的药物数量最多,同时提取的非药物数量较少。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验