Travers Debbie A, Haas Stephanie W
Department of Emergency Medicine, School of Medicine, CB 7594, University of North Carolina at Chapel Hill, Chapel Hill, NC 27599-7594, USA.
Acad Emerg Med. 2004 Nov;11(11):1170-6. doi: 10.1197/j.aem.2004.08.012.
Emergency Medical Text Processor (EMT-P) version 1, a natural language processing system that cleans emergency department text (e.g., chst pn, chest pai), was developed to maximize extraction of standard terms (e.g., chest pain). The authors compared the number of standard terms extracted from raw chief complaint (CC) data with that for CC data cleaned with EMT-P and evaluated the accuracy of EMT-P.
This cross-sectional observation study included CC text entries for all emergency department visits to three tertiary care centers in 2001. Terms were extracted from CC entries before and after cleaning with EMT-P. Descriptive statistics included number and percentage of all entries (tokens) and all unique entries (types) that matched a standard term from the Unified Medical Language System (UMLS). An expert panel rated the accuracy of the CC-UMLS term matches; inter-rater reliability was measured with kappa.
The authors collected 203,509 CC entry tokens, of which 63,946 were unique entry types. For the raw data, 89,337 tokens (44%) and 5,081 types (8%) matched a standard term. After EMT-P cleaning, 168,050 tokens (83%) and 44,430 types (69%) matched a standard term. The expert panel reached consensus on 201 of the 222 CC-UMLS term matches reviewed (kappa=0.69-0.72). Ninety-six percent of the 201 matches were rated equivalent or related. Thirty-eight percent of the nonmatches were found to match UMLS concepts.
EMT-P version 1 is relatively accurate, and cleaning with EMT-P improved the CC-UMLS term match rate over raw data. The authors identified areas for improvement in future EMT-P versions and issues to be resolved in developing a standard CC terminology.
开发了急诊医学文本处理器(EMT-P)版本1,这是一个自然语言处理系统,用于清理急诊科文本(如chst pn,胸痛),以最大限度地提取标准术语(如胸痛)。作者比较了从原始主诉(CC)数据中提取的标准术语数量与用EMT-P清理后的CC数据中的标准术语数量,并评估了EMT-P的准确性。
这项横断面观察研究纳入了2001年三个三级医疗中心所有急诊科就诊的CC文本条目。在用EMT-P清理之前和之后从CC条目中提取术语。描述性统计包括与统一医学语言系统(UMLS)中的标准术语匹配的所有条目(词元)和所有唯一条目(词型)的数量和百分比。一个专家小组对CC-UMLS术语匹配的准确性进行评分;使用kappa测量评分者间信度。
作者收集了203,509个CC条目词元,其中63,946个是唯一的词型。对于原始数据,89,337个词元(44%)和5,081个词型(8%)与标准术语匹配。经过EMT-P清理后,168,050个词元(83%)和44,430个词型(69%)与标准术语匹配。专家小组对审查的222个CC-UMLS术语匹配中的201个达成了共识(kappa=0.69-0.72)。在201个匹配中,96%被评为等效或相关。在不匹配中,38%被发现与UMLS概念匹配。
EMT-P版本1相对准确,并且与原始数据相比,用EMT-P清理提高了CC-UMLS术语匹配率。作者确定了未来EMT-P版本的改进领域以及开发标准CC术语时需要解决的问题。