利用从文献中挖掘出的可计算知识来阐明基于电子健康记录的药物警戒中的混杂因素。

Using computable knowledge mined from the literature to elucidate confounders for EHR-based pharmacovigilance.

机构信息

University of Pittsburgh School of Medicine, Department of Biomedical Informatics, Pittsburgh, PA, United States.

The University of Texas MD Anderson Cancer Center, Department of Biostatistics, Houston, TX, United States.

出版信息

J Biomed Inform. 2021 May;117:103719. doi: 10.1016/j.jbi.2021.103719. Epub 2021 Mar 11.

DOI:10.1016/j.jbi.2021.103719

PMID:33716168

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC8559730/

Abstract

INTRODUCTION

Drug safety research asks causal questions but relies on observational data. Confounding bias threatens the reliability of studies using such data. The successful control of confounding requires knowledge of variables called confounders affecting both the exposure and outcome of interest. However, causal knowledge of dynamic biological systems is complex and challenging. Fortunately, computable knowledge mined from the literature may hold clues about confounders. In this paper, we tested the hypothesis that incorporating literature-derived confounders can improve causal inference from observational data.

METHODS

We introduce two methods (semantic vector-based and string-based confounder search) that query literature-derived information for confounder candidates to control, using SemMedDB, a database of computable knowledge mined from the biomedical literature. These methods search SemMedDB for confounders by applying semantic constraint search for indications treated by the drug (exposure) and that are also known to cause the adverse event (outcome). We then include the literature-derived confounder candidates in statistical and causal models derived from free-text clinical notes. For evaluation, we use a reference dataset widely used in drug safety containing labeled pairwise relationships between drugs and adverse events and attempt to rediscover these relationships from a corpus of 2.2 M NLP-processed free-text clinical notes. We employ standard adjustment and causal inference procedures to predict and estimate causal effects by informing the models with varying numbers of literature-derived confounders and instantiating the exposure, outcome, and confounder variables in the models with dichotomous EHR-derived data. Finally, we compare the results from applying these procedures with naive measures of association (χ and reporting odds ratio) and with each other.

RESULTS AND CONCLUSIONS

We found semantic vector-based search to be superior to string-based search at reducing confounding bias. However, the effect of including more rather than fewer literature-derived confounders was inconclusive. We recommend using targeted learning estimation methods that can address treatment-confounder feedback, where confounders also behave as intermediate variables, and engaging subject-matter experts to adjudicate the handling of problematic covariates.

摘要

简介

药物安全研究提出因果问题，但依赖于观察性数据。混杂偏差威胁着使用此类数据进行研究的可靠性。成功控制混杂需要了解同时影响暴露和感兴趣结局的变量，这些变量称为混杂因素。然而，动态生物系统的因果知识复杂且具有挑战性。幸运的是，从文献中挖掘出的可计算知识可能包含混杂因素的线索。在本文中，我们检验了一个假设，即纳入文献衍生的混杂因素可以提高从观察性数据中进行因果推断的能力。

方法

我们引入了两种方法（基于语义向量和基于字符串的混杂因素搜索），使用 SemMedDB（从生物医学文献中挖掘出的可计算知识数据库）查询文献衍生信息以寻找需要控制的混杂因素候选者。这些方法通过对药物（暴露）治疗的适应证以及已知会导致不良事件（结局）的适应证进行语义约束搜索，在 SemMedDB 中搜索混杂因素。然后，我们将文献衍生的混杂因素候选者纳入从 220 万份自然语言处理（NLP）处理的临床笔记中提取的统计和因果模型中。为了评估，我们使用了一个在药物安全领域广泛使用的参考数据集，该数据集包含药物和不良事件之间的标记成对关系，并尝试从 220 万份 NLP 处理的临床笔记语料库中重新发现这些关系。我们使用标准调整和因果推断程序，通过向模型提供不同数量的文献衍生混杂因素并将模型中的暴露、结局和混杂因素变量实例化为来自 EHR 的二值数据，来预测和估计因果效应。最后，我们将这些方法的结果与关联的简单度量（χ 和报告比值比）以及彼此进行了比较。

结果和结论

我们发现基于语义向量的搜索在减少混杂偏差方面优于基于字符串的搜索。然而，纳入更多而非更少文献衍生混杂因素的效果尚无定论。我们建议使用针对性学习估计方法，这些方法可以解决混杂因素也作为中间变量的治疗混杂反馈问题，并聘请主题专家来裁定有问题的协变量的处理。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a260/8559730/eb9b0a9ff0c3/nihms-1683147-f0001.jpg

相似文献

Using computable knowledge mined from the literature to elucidate confounders for EHR-based pharmacovigilance.利用从文献中挖掘出的可计算知识来阐明基于电子健康记录的药物警戒中的混杂因素。

J Biomed Inform. 2021 May;117:103719. doi: 10.1016/j.jbi.2021.103719. Epub 2021 Mar 11.

Causal feature selection using a knowledge graph combining structured knowledge from the biomedical literature and ontologies: A use case studying depression as a risk factor for Alzheimer's disease.使用结合生物医学文献和本体结构化知识的知识图进行因果特征选择：以抑郁症作为阿尔茨海默病风险因素为例的研究。

J Biomed Inform. 2023 Jun;142:104368. doi: 10.1016/j.jbi.2023.104368. Epub 2023 Apr 21.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.在流行地区，服用抗叶酸抗疟药物的人群中，叶酸补充剂与疟疾易感性和严重程度的关系。

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Machine Learning in Causal Inference: Application in Pharmacovigilance.机器学习在因果推断中的应用：在药物警戒中的应用。

Drug Saf. 2022 May;45(5):459-476. doi: 10.1007/s40264-022-01155-6. Epub 2022 May 17.

Can statistical adjustment guided by causal inference improve the accuracy of effect estimation? A simulation and empirical research based on meta-analyses of case-control studies.基于病例对照研究的荟萃分析的模拟和实证研究：因果推理指导的统计调整能否提高效应估计的准确性？

BMC Med Inform Decis Mak. 2020 Dec 11;20(1):333. doi: 10.1186/s12911-020-01343-3.

Literature-Based Discovery of Confounding in Observational Clinical Data.基于文献的观察性临床数据混杂因素发现

AMIA Annu Symp Proc. 2017 Feb 10;2016:1920-1929. eCollection 2016.

Bias Due to Confounders for the Exposure-Competing Risk Relationship.暴露-竞争风险关系中混杂因素导致的偏倚

Epidemiology. 2017 Jan;28(1):20-27. doi: 10.1097/EDE.0000000000000565.

A comparison of confounder selection and adjustment methods for estimating causal effects using large healthcare databases.利用大型医疗保健数据库估计因果效应的混杂因素选择和调整方法比较。

Pharmacoepidemiol Drug Saf. 2022 Apr;31(4):424-433. doi: 10.1002/pds.5403. Epub 2022 Jan 7.

[Causal Inference in Medicine Part II. Directed acyclic graphs--a useful method for confounder selection, categorization of potential biases, and hypothesis specification].[医学中的因果推断第二部分。有向无环图——一种用于选择混杂因素、潜在偏倚分类和假设设定的有用方法]

Nihon Eiseigaku Zasshi. 2009 Sep;64(4):796-805. doi: 10.1265/jjh.64.796.

A simulation study on matched case-control designs in the perspective of causal diagrams.从因果图角度对匹配病例对照设计的模拟研究。

BMC Med Res Methodol. 2016 Aug 20;16(1):102. doi: 10.1186/s12874-016-0206-3.

引用本文的文献

Natural language processing for scalable feature engineering and ultra-high-dimensional confounding adjustment in healthcare database studies.医疗数据库研究中用于可扩展特征工程和超高维混杂因素调整的自然语言处理

medRxiv. 2025 Jan 31:2025.01.30.25321403. doi: 10.1101/2025.01.30.25321403.

Applying Natural Language Processing to Textual Data From Clinical Data Warehouses: Systematic Review.将自然语言处理应用于临床数据仓库中的文本数据：系统评价。

JMIR Med Inform. 2023 Dec 15;11:e42477. doi: 10.2196/42477.

BioREx: Improving Biomedical Relation Extraction by Leveraging Heterogeneous Datasets.BioREx：利用异构数据集改进生物医学关系抽取

ArXiv. 2023 Jun 19:arXiv:2306.11189v1.

Use of Electronic Health Record Data for Drug Safety Signal Identification: A Scoping Review.利用电子健康记录数据识别药物安全信号：范围综述。

Drug Saf. 2023 Aug;46(8):725-742. doi: 10.1007/s40264-023-01325-0. Epub 2023 Jun 20.

Discovering causal paths to diabetic nephropathy by combining computable biomedical knowledge with graph mining algorithms.通过将可计算的生物医学知识与图挖掘算法相结合，发现通向糖尿病肾病的因果路径。

AMIA Annu Symp Proc. 2023 Apr 29;2022:1118-1124. eCollection 2022.

J Biomed Inform. 2023 Jun;142:104368. doi: 10.1016/j.jbi.2023.104368. Epub 2023 Apr 21.

Machine Learning in Causal Inference: Application in Pharmacovigilance.机器学习在因果推断中的应用：在药物警戒中的应用。

Drug Saf. 2022 May;45(5):459-476. doi: 10.1007/s40264-022-01155-6. Epub 2022 May 17.

本文引用的文献

Knowledge-Based Biomedical Data Science.基于知识的生物医学数据科学

Annu Rev Biomed Data Sci. 2020 Jul;3:23-41. doi: 10.1146/annurev-biodatasci-010820-091627. Epub 2020 Apr 7.

FLUTE: Fast and reliable knowledge retrieval from biomedical literature.FLUTE：从生物医学文献中快速可靠地检索知识。

Database (Oxford). 2020 Jan 1;2020. doi: 10.1093/database/baaa056.

Broad-coverage biomedical relation extraction with SemRep.基于 SemRep 的广谱生物医学关系抽取。

BMC Bioinformatics. 2020 May 14;21(1):188. doi: 10.1186/s12859-020-3517-7.

Ensembles of natural language processing systems for portable phenotyping solutions.用于便携表型解决方案的自然语言处理系统集合。

J Biomed Inform. 2019 Dec;100:103318. doi: 10.1016/j.jbi.2019.103318. Epub 2019 Oct 23.

Learning to detect and understand drug discontinuation events from clinical narratives.从临床叙述中学习检测和理解药物停用事件。

J Am Med Inform Assoc. 2019 Oct 1;26(10):943-951. doi: 10.1093/jamia/ocz048.

Principles of confounder selection.混杂因素选择原则。

Eur J Epidemiol. 2019 Mar;34(3):211-219. doi: 10.1007/s10654-019-00494-6. Epub 2019 Mar 6.

Detecting Potential Adverse Drug Reactions Using a Deep Neural Network Model.使用深度神经网络模型检测潜在药物不良反应

J Med Internet Res. 2019 Feb 6;21(2):e11016. doi: 10.2196/11016.

Clinicians' reasoning as reflected in electronic clinical note-entry and reading/retrieval: a systematic review and qualitative synthesis.临床医生在电子临床记录输入和阅读/检索中的推理：系统评价和定性综合。

J Am Med Inform Assoc. 2019 Feb 1;26(2):172-184. doi: 10.1093/jamia/ocy155.

Educational Note: Paradoxical collider effect in the analysis of non-communicable disease epidemiological data: a reproducible illustration and web application.教育注释：分析非传染性疾病流行病学数据中的矛盾碰撞效应：可重复再现的说明和网络应用。

Int J Epidemiol. 2019 Apr 1;48(2):640-653. doi: 10.1093/ije/dyy275.

Effect of vocabulary mapping for conditions on phenotype cohorts.条件词汇映射对表型队列的影响。

J Am Med Inform Assoc. 2018 Dec 1;25(12):1618-1625. doi: 10.1093/jamia/ocy124.

文献检索

告别复杂PubMed语法，用中文像聊天一样搜索，搜遍4000万医学文献。AI智能推荐，让科研检索更轻松。

立即免费搜索

文件翻译

保留排版，准确专业，支持PDF/Word/PPT等文件格式，支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述，25分钟生成高质量综述，智能提取关键信息，辅助科研写作。

立即免费体验