Suppr超能文献

利用从文献中挖掘出的可计算知识来阐明基于电子健康记录的药物警戒中的混杂因素。

Using computable knowledge mined from the literature to elucidate confounders for EHR-based pharmacovigilance.

机构信息

University of Pittsburgh School of Medicine, Department of Biomedical Informatics, Pittsburgh, PA, United States.

The University of Texas MD Anderson Cancer Center, Department of Biostatistics, Houston, TX, United States.

出版信息

J Biomed Inform. 2021 May;117:103719. doi: 10.1016/j.jbi.2021.103719. Epub 2021 Mar 11.

Abstract

INTRODUCTION

Drug safety research asks causal questions but relies on observational data. Confounding bias threatens the reliability of studies using such data. The successful control of confounding requires knowledge of variables called confounders affecting both the exposure and outcome of interest. However, causal knowledge of dynamic biological systems is complex and challenging. Fortunately, computable knowledge mined from the literature may hold clues about confounders. In this paper, we tested the hypothesis that incorporating literature-derived confounders can improve causal inference from observational data.

METHODS

We introduce two methods (semantic vector-based and string-based confounder search) that query literature-derived information for confounder candidates to control, using SemMedDB, a database of computable knowledge mined from the biomedical literature. These methods search SemMedDB for confounders by applying semantic constraint search for indications treated by the drug (exposure) and that are also known to cause the adverse event (outcome). We then include the literature-derived confounder candidates in statistical and causal models derived from free-text clinical notes. For evaluation, we use a reference dataset widely used in drug safety containing labeled pairwise relationships between drugs and adverse events and attempt to rediscover these relationships from a corpus of 2.2 M NLP-processed free-text clinical notes. We employ standard adjustment and causal inference procedures to predict and estimate causal effects by informing the models with varying numbers of literature-derived confounders and instantiating the exposure, outcome, and confounder variables in the models with dichotomous EHR-derived data. Finally, we compare the results from applying these procedures with naive measures of association (χ and reporting odds ratio) and with each other.

RESULTS AND CONCLUSIONS

We found semantic vector-based search to be superior to string-based search at reducing confounding bias. However, the effect of including more rather than fewer literature-derived confounders was inconclusive. We recommend using targeted learning estimation methods that can address treatment-confounder feedback, where confounders also behave as intermediate variables, and engaging subject-matter experts to adjudicate the handling of problematic covariates.

摘要

简介

药物安全研究提出因果问题,但依赖于观察性数据。混杂偏差威胁着使用此类数据进行研究的可靠性。成功控制混杂需要了解同时影响暴露和感兴趣结局的变量,这些变量称为混杂因素。然而,动态生物系统的因果知识复杂且具有挑战性。幸运的是,从文献中挖掘出的可计算知识可能包含混杂因素的线索。在本文中,我们检验了一个假设,即纳入文献衍生的混杂因素可以提高从观察性数据中进行因果推断的能力。

方法

我们引入了两种方法(基于语义向量和基于字符串的混杂因素搜索),使用 SemMedDB(从生物医学文献中挖掘出的可计算知识数据库)查询文献衍生信息以寻找需要控制的混杂因素候选者。这些方法通过对药物(暴露)治疗的适应证以及已知会导致不良事件(结局)的适应证进行语义约束搜索,在 SemMedDB 中搜索混杂因素。然后,我们将文献衍生的混杂因素候选者纳入从 220 万份自然语言处理(NLP)处理的临床笔记中提取的统计和因果模型中。为了评估,我们使用了一个在药物安全领域广泛使用的参考数据集,该数据集包含药物和不良事件之间的标记成对关系,并尝试从 220 万份 NLP 处理的临床笔记语料库中重新发现这些关系。我们使用标准调整和因果推断程序,通过向模型提供不同数量的文献衍生混杂因素并将模型中的暴露、结局和混杂因素变量实例化为来自 EHR 的二值数据,来预测和估计因果效应。最后,我们将这些方法的结果与关联的简单度量(χ 和报告比值比)以及彼此进行了比较。

结果和结论

我们发现基于语义向量的搜索在减少混杂偏差方面优于基于字符串的搜索。然而,纳入更多而非更少文献衍生混杂因素的效果尚无定论。我们建议使用针对性学习估计方法,这些方法可以解决混杂因素也作为中间变量的治疗混杂反馈问题,并聘请主题专家来裁定有问题的协变量的处理。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/a260/8559730/eb9b0a9ff0c3/nihms-1683147-f0001.jpg

相似文献

引用本文的文献

本文引用的文献

1
Knowledge-Based Biomedical Data Science.基于知识的生物医学数据科学
Annu Rev Biomed Data Sci. 2020 Jul;3:23-41. doi: 10.1146/annurev-biodatasci-010820-091627. Epub 2020 Apr 7.
3
Broad-coverage biomedical relation extraction with SemRep.基于 SemRep 的广谱生物医学关系抽取。
BMC Bioinformatics. 2020 May 14;21(1):188. doi: 10.1186/s12859-020-3517-7.
6
Principles of confounder selection.混杂因素选择原则。
Eur J Epidemiol. 2019 Mar;34(3):211-219. doi: 10.1007/s10654-019-00494-6. Epub 2019 Mar 6.
10
Effect of vocabulary mapping for conditions on phenotype cohorts.条件词汇映射对表型队列的影响。
J Am Med Inform Assoc. 2018 Dec 1;25(12):1618-1625. doi: 10.1093/jamia/ocy124.

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验