Department of Biomedical Informatics, Vanderbilt University Medical Center, Nashville, Tennessee, USA.
Department of Medicine, Vanderbilt University Medical Center, Nashville, Tennessee, USA.
J Am Med Inform Assoc. 2023 Jun 20;30(7):1257-1265. doi: 10.1093/jamia/ocad078.
Knowledgebases are needed to clarify correlations observed in real-world electronic health record (EHR) data. We posit design principles, present a unifying framework, and report a test of concept.
We structured a knowledge framework along 3 axes: condition of interest, knowledge source, and taxonomy. In our test of concept, we used hypertension as our condition of interest, literature and VanderbiltDDx knowledgebase as sources, and phecodes as our taxonomy. In a cohort of 832 566 deidentified EHRs, we modeled blood pressure and heart rate by sex and age, classified individuals by hypertensive status, and ran a Phenome-wide Association Study (PheWAS) for hypertension. We compared the correlations from PheWAS to the associations in our knowledgebase.
We produced PhecodeKbHtn: a knowledgebase comprising 167 hypertension-associated diseases, 15 of which were also negatively associated with blood pressure (pos+neg). Our hypertension PheWAS included 1914 phecodes, 129 of which were in the PhecodeKbHtn. Among the PheWAS association results, phecodes that were in PhecodeKbHtn had larger effect sizes compared with those phecodes not in the knowledgebase.
Each source contributed unique and additive associations. Models of blood pressure and heart rate by age and sex were consistent with prior cohort studies. All but 4 PheWAS positive and negative correlations for phecodes in PhecodeKbHtn may be explained by knowledgebase associations, hypertensive cardiac complications, or causes of hypertension independently associated with hypotension.
It is feasible to assemble a knowledgebase that is compatible with EHR data to aid interpretation of clinical correlation research.
需要知识库来阐明在真实电子健康记录 (EHR) 数据中观察到的相关性。我们提出了设计原则,提出了一个统一的框架,并报告了概念验证。
我们沿着 3 个轴构建了一个知识框架:感兴趣的条件、知识来源和分类法。在我们的概念验证中,我们将高血压作为我们感兴趣的条件,将文献和 VanderbiltDDx 知识库作为来源,并使用 phecode 作为我们的分类法。在一个由 832566 个匿名 EHR 组成的队列中,我们按性别和年龄对血压和心率进行建模,根据高血压状态对个体进行分类,并对高血压进行表型全基因组关联研究 (PheWAS)。我们将 PheWAS 的相关性与知识库中的关联进行了比较。
我们生成了 PhecodeKbHtn:一个包含 167 种与高血压相关疾病的知识库,其中 15 种疾病与血压呈负相关(阳性+阴性)。我们的高血压 PheWAS 包括 1914 个 phecode,其中 129 个在 PhecodeKbHtn 中。在 PheWAS 关联结果中,与不在知识库中的 phecode 相比,在 PhecodeKbHtn 中的 phecode 的效应大小更大。
每个来源都提供了独特且可累加的关联。按年龄和性别划分的血压和心率模型与先前的队列研究一致。除了 4 个 phecode 的 PheWAS 阳性和阴性关联外,PhecodeKbHtn 中的所有 phecode 都可能可以用知识库关联、高血压心脏并发症或与低血压独立相关的高血压病因来解释。
组装一个与 EHR 数据兼容的知识库来辅助解释临床相关性研究是可行的。