Keselman Alla, Smith Catherine Arnott, Divita Guy, Kim Hyeoneui, Browne Allen C, Leroy Gondy, Zeng-Treitler Qing
Lister Hill National Center for Biomedical Communications, National Library of Medicine, National Institute of Health, 8600 Rockville Pike, Bethesda, MD, USA.
J Am Med Inform Assoc. 2008 Jul-Aug;15(4):496-505. doi: 10.1197/jamia.M2599. Epub 2008 Apr 24.
This study has two objectives: first, to identify and characterize consumer health terms not found in the Unified Medical Language System (UMLS) Metathesaurus (2007 AB); second, to describe the procedure for creating new concepts in the process of building a consumer health vocabulary. How do the unmapped consumer health concepts relate to the existing UMLS concepts? What is the place of these new concepts in professional medical discourse?
The consumer health terms were extracted from two large corpora derived in the process of Open Access Collaboratory Consumer Health Vocabulary (OAC CHV) building. Terms that could not be mapped to existing UMLS concepts via machine and manual methods prompted creation of new concepts, which were then ascribed semantic types, related to existing UMLS concepts, and coded according to specified criteria.
This approach identified 64 unmapped concepts, 17 of which were labeled as uniquely "lay" and not feasible for inclusion in professional health terminologies. The remaining terms constituted potential candidates for inclusion in professional vocabularies, or could be constructed by post-coordinating existing UMLS terms. The relationship between new and existing concepts differed depending on the corpora from which they were extracted.
Non-mapping concepts constitute a small proportion of consumer health terms, but a proportion that is likely to affect the process of consumer health vocabulary building. We have identified a novel approach for identifying such concepts.
本研究有两个目标:第一,识别并描述统一医学语言系统(UMLS)元词表(2007年AB版)中未出现的消费者健康术语;第二,描述在构建消费者健康词汇表过程中创建新概念的程序。未映射的消费者健康概念与现有的UMLS概念有何关系?这些新概念在专业医学话语中的地位如何?
消费者健康术语从开放获取协作消费者健康词汇表(OAC CHV)构建过程中衍生的两个大型语料库中提取。无法通过机器和人工方法映射到现有UMLS概念的术语促使创建新概念,然后为这些新概念赋予语义类型,使其与现有的UMLS概念相关联,并根据指定标准进行编码。
这种方法识别出64个未映射的概念,其中17个被标记为独特的“外行”概念,不适合纳入专业健康术语表。其余术语构成了纳入专业词汇表的潜在候选词,或者可以通过对现有的UMLS术语进行后协调来构建。新老概念之间的关系因提取它们的语料库而异。
未映射概念在消费者健康术语中占比很小,但这一比例可能会影响消费者健康词汇表的构建过程。我们已经确定了一种识别此类概念的新方法。