Suppr超能文献

采用半自动方法提高 OMOP 词汇表中疫苗概念图的质量。

Towards quality improvement of vaccine concept mappings in the OMOP vocabulary with a semi-automated method.

机构信息

Department of Neurology, The University of Texas Health Science Center at Houston, Houston, TX, USA.

Odysseus Data Services, Cambridge, MA, USA.

出版信息

J Biomed Inform. 2022 Oct;134:104162. doi: 10.1016/j.jbi.2022.104162. Epub 2022 Aug 25.

Abstract

The Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) provides a unified model to integrate disparate real-world data (RWD) sources. An integral part of the OMOP CDM is the Standardized Vocabularies (henceforth referred to as the OMOP vocabulary), which enables organization and standardization of medical concepts across various clinical domains of the OMOP CDM. For concepts with the same meaning from different source vocabularies, one is designated as the standard concept, while the others are specified as non-standard or source concepts and mapped to the standard one. However, due to the heterogeneity of source vocabularies, there may exist mapping issues such as erroneous mappings and missing mappings in the OMOP vocabulary, which could affect the results of downstream analyses with RWD. In this paper, we focus on quality assurance of vaccine concept mappings in the OMOP vocabulary, which is necessary to accurately harness the power of RWD on vaccines. We introduce a semi-automated lexical approach to audit vaccine mappings in the OMOP vocabulary. We generated two types of vaccine-pairs: mapped and unmapped, where mapped vaccine-pairs are pairs of vaccine concepts with a "Maps to" relationship, while unmapped vaccine-pairs are those without a "Maps to" relationship. We represented each vaccine concept name as a set of words, and derived term-difference pairs (i.e., name differences) for mapped and unmapped vaccine-pairs. If the same term-difference pair can be obtained by both mapped and unmapped vaccine-pairs, then this is considered as a potential mapping inconsistency. Applying this approach to the vaccine mappings in OMOP, a total of 2087 potentially mapping inconsistencies were obtained. A randomly selected 200 samples were evaluated by domain experts to identify, validate, and categorize the inconsistencies. Experts identified 95 cases revealing valid mapping issues. The remaining 105 cases were found to be invalid due to the external and/or contextual information used in the mappings that were not reflected in the concept names of vaccines. This indicates that our semi-automated approach shows promise in identifying mapping inconsistencies among vaccine concepts in the OMOP vocabulary.

摘要

观察性医学结局伙伴关系(OMOP)通用数据模型(CDM)提供了一个统一的模型,用于整合不同的真实世界数据(RWD)来源。OMOP CDM 的一个组成部分是标准化词汇表(简称 OMOP 词汇表),它能够实现 OMOP CDM 各个临床领域的医学概念的组织和标准化。对于来自不同源词汇表的具有相同含义的概念,其中一个被指定为标准概念,而其他则被指定为非标准或源概念,并映射到标准概念上。然而,由于源词汇表的异质性,OMOP 词汇表中可能存在映射错误和映射缺失等问题,这可能会影响使用 RWD 进行下游分析的结果。本文重点关注 OMOP 词汇表中疫苗概念映射的质量保证,这对于准确利用 RWD 疫苗的力量是必要的。我们引入了一种半自动化的词汇方法来审核 OMOP 词汇表中的疫苗映射。我们生成了两种类型的疫苗对:映射对和未映射对,其中映射对是具有“Maps to”关系的疫苗概念对,而未映射对是没有“Maps to”关系的疫苗对。我们将每个疫苗概念名称表示为一组单词,并为映射对和未映射对生成术语差异对(即名称差异)。如果映射对和未映射对都可以得到相同的术语差异对,则认为这是潜在的映射不一致。将这种方法应用于 OMOP 中的疫苗映射,总共得到了 2087 个潜在的映射不一致。随机选择了 200 个样本由领域专家进行评估,以识别、验证和分类不一致。专家发现了 95 个案例,揭示了有效的映射问题。其余 105 个案例被认为是无效的,因为映射中使用的外部和/或上下文信息没有反映在疫苗名称中。这表明,我们的半自动方法在识别 OMOP 词汇表中疫苗概念之间的映射不一致方面具有潜力。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/f27a/9940475/507bb4737a53/nihms-1870816-f0001.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验