Marsolo Keith, Curtis Lesley, Qualls Laura, Xu Jennifer, Zhang Yinghong, Phillips Thomas, Hill C Larry, Sanders Gretchen, Maro Judith C, Kiernan Daniel, Draper Christine, Coughlin Kevin, Dutcher Sarah K, Hernández-Muñoz José J, Falconer Monique
Department of Population Health Sciences Duke University School of Medicine Durham North Carolina USA.
Duke Clinical Research Institute Duke University School of Medicine Durham North Carolina USA.
Learn Health Syst. 2024 Oct 21;9(2):e10468. doi: 10.1002/lrh2.10468. eCollection 2025 Apr.
(1) Assess the harmonization of structured electronic health record data (laboratory results and medications) to reference terminologies and characterize the severity of issues. (2) Identify issues of data completeness by comparing complementary data domains, stratifying by time, care setting, and provenance.
Queries were distributed to 3 Data Partners (DP). Using harmonization queries, we examined the top 200 laboratory results and medications by volume, identifying outliers and computing summary statistics. The completeness queries looked at 4 conditions of interest and related clinical concepts. Counts were generated for each condition, stratified by year, encounter type, and provenance. We analyzed trends over time within and across DPs.
We found that the median number of codes associated with a given laboratory/medication name (and vice versa) generally met expectations, though there were DP-specific issues that resulted in outliers. In addition, there were drastic differences in the percentage of patients with a given concept depending on provenance.
The harmonization queries surfaced several mapping errors, as well as issues with overly specific codes and records with "null" codes. The completeness queries demonstrated having access to multiple types of data provenance provides more robust results compared with any single provenance type. Harmonization errors between source data and reference terminologies may not be widespread but do exist within CDMs, affecting tens of thousands or even millions of records. Provenance information can help identify potential completeness issues with EHR data, but only if it is represented in the CDM and then populated by DPs.
(1)评估结构化电子健康记录数据(实验室检查结果和用药情况)与参考术语的一致性,并描述问题的严重程度。(2)通过比较互补数据域,按时间、护理环境和来源进行分层,识别数据完整性问题。
向3个数据合作伙伴(DP)分发查询。使用一致性查询,我们按数量检查了前200项实验室检查结果和用药情况,识别异常值并计算汇总统计数据。完整性查询关注4种感兴趣的病症及相关临床概念。针对每种病症生成计数,并按年份、就诊类型和来源进行分层。我们分析了各DP内部和之间随时间的趋势。
我们发现,与给定实验室检查/用药名称相关的代码中位数(反之亦然)总体上符合预期,不过存在特定于DP的问题导致出现异常值。此外,根据来源不同,患有特定概念病症的患者百分比存在巨大差异。
一致性查询揭示了几个映射错误,以及过于具体的代码和带有“空”代码的记录所存在的问题。完整性查询表明,与任何单一来源类型相比,获取多种类型的数据来源能提供更可靠的结果。源数据与参考术语之间的一致性错误可能并不普遍,但在临床数据模型(CDM)中确实存在,影响到数万甚至数百万条记录。来源信息有助于识别电子健康记录数据潜在的完整性问题,但前提是它要在CDM中体现并由DP填充。