Human and Molecular Genetics Center, Medical College of Wisconsin, Milwaukee, WI, United States.
JMIR Med Inform. 2014 Mar 18;2(1):e5. doi: 10.2196/medinform.3172.
Structured information within patient medical records represents a largely untapped treasure trove of research data. In the United States, privacy issues notwithstanding, this has recently become more accessible thanks to the increasing adoption of electronic health records (EHR) and health care data standards fueled by the Meaningful Use legislation. The other side of the coin is that it is now becoming increasingly more difficult to navigate the profusion of many disparate clinical terminology standards, which often span millions of concepts.
The objective of our study was to develop a methodology for integrating large amounts of structured clinical information that is both terminology agnostic and able to capture heterogeneous clinical phenotypes including problems, procedures, medications, and clinical results (such as laboratory tests and clinical observations). In this context, we define phenotyping as the extraction of all clinically relevant features contained in the EHR.
The scope of the project was framed by the Common Meaningful Use (MU) Dataset terminology standards; the Systematized Nomenclature of Medicine Clinical Terms (SNOMED CT), RxNorm, the Logical Observation Identifiers Names and Codes (LOINC), the Current Procedural Terminology (CPT), the Health care Common Procedure Coding System (HCPCS), the International Classification of Diseases Ninth Revision Clinical Modification (ICD-9-CM), and the International Classification of Diseases Tenth Revision Clinical Modification (ICD-10-CM). The Unified Medical Language System (UMLS) was used as a mapping layer among the MU ontologies. An extract, load, and transform approach separated original annotations in the EHR from the mapping process and allowed for continuous updates as the terminologies were updated. Additionally, we integrated all terminologies into a single UMLS derived ontology and further optimized it to make the relatively large concept graph manageable.
The initial evaluation was performed with simulated data from the Clinical Avatars project using 100,000 virtual patients undergoing a 90 day, genotype guided, warfarin dosing protocol. This dataset was annotated with standard MU terminologies, loaded, and transformed using the UMLS. We have deployed this methodology to scale in our in-house analytics platform using structured EHR data for 7931 patients (12 million clinical observations) treated at the Froedtert Hospital. A demonstration limited to Clinical Avatars data is available on the Internet using the credentials user "jmirdemo" and password "jmirdemo".
Despite its inherent complexity, the UMLS can serve as an effective interface terminology for many of the clinical data standards currently used in the health care domain.
患者病历中的结构化信息代表了一个尚未开发的研究数据宝库。在美国,尽管存在隐私问题,但由于电子健康记录 (EHR) 的日益普及以及受“有意义使用”法规推动的医疗保健数据标准,这一点最近变得更加容易实现。另一方面,现在越来越难以驾驭许多不同的临床术语标准,这些标准通常涵盖数百万个概念。
我们研究的目的是开发一种方法来整合大量的结构化临床信息,这种方法既与术语无关,又能够捕获包括问题、程序、药物和临床结果(如实验室检查和临床观察)在内的异构临床表型。在这种情况下,我们将表型定义为从 EHR 中提取所有包含的临床相关特征。
项目的范围由通用有意义使用 (MU) 数据集术语标准、系统命名法医学术语 (SNOMED CT)、RxNorm、逻辑观察标识符名称和代码 (LOINC)、当前程序术语 (CPT)、医疗保健常见程序编码系统 (HCPCS)、国际疾病分类第九修订临床修订版 (ICD-9-CM) 和国际疾病分类第十版临床修订版 (ICD-10-CM) 定义。统一医学语言系统 (UMLS) 被用作 MU 本体之间的映射层。提取、加载和转换方法将 EHR 中的原始注释与映射过程分开,并允许随着术语的更新而进行持续更新。此外,我们将所有术语集成到一个单一的 UMLS 衍生本体中,并进一步对其进行优化,以使其相对较大的概念图易于管理。
最初的评估是使用来自 Clinical Avatars 项目的模拟数据进行的,涉及 10 万名接受 90 天基因指导华法林剂量方案的虚拟患者。该数据集使用标准 MU 术语进行了注释,并使用 UMLS 进行了加载和转换。我们已经将这种方法部署到我们的内部分析平台中,使用 7931 名患者(1200 万条临床观察)的结构化 EHR 数据进行治疗。在 Froedtert 医院。可在 Internet 上使用凭据“jmirdemo”和密码“jmirdemo”访问对 Clinical Avatars 数据的演示。
尽管存在内在的复杂性,但 UMLS 可以作为医疗保健领域目前使用的许多临床数据标准的有效接口术语。