使用结合生物医学文献和本体结构化知识的知识图进行因果特征选择：以抑郁症作为阿尔茨海默病风险因素为例的研究。

Causal feature selection using a knowledge graph combining structured knowledge from the biomedical literature and ontologies: A use case studying depression as a risk factor for Alzheimer's disease.

机构信息

Department of Biomedical Informatics, School of Medicine, University of Pittsburgh, Pittsburgh, PA, USA.

Intelligent Systems Program, University of Pittsburgh, Pittsburgh, PA, USA.

出版信息

J Biomed Inform. 2023 Jun;142:104368. doi: 10.1016/j.jbi.2023.104368. Epub 2023 Apr 21.

DOI:10.1016/j.jbi.2023.104368

PMID:37086959

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10355339/

Abstract

BACKGROUND

Causal feature selection is essential for estimating effects from observational data. Identifying confounders is a crucial step in this process. Traditionally, researchers employ content-matter expertise and literature review to identify confounders. Uncontrolled confounding from unidentified confounders threatens validity, conditioning on intermediate variables (mediators) weakens estimates, and conditioning on common effects (colliders) induces bias. Additionally, without special treatment, erroneous conditioning on variables combining roles introduces bias. However, the vast literature is growing exponentially, making it infeasible to assimilate this knowledge. To address these challenges, we introduce a novel knowledge graph (KG) application enabling causal feature selection by combining computable literature-derived knowledge with biomedical ontologies. We present a use case of our approach specifying a causal model for estimating the total causal effect of depression on the risk of developing Alzheimer's disease (AD) from observational data.

METHODS

We extracted computable knowledge from a literature corpus using three machine reading systems and inferred missing knowledge using logical closure operations. Using a KG framework, we mapped the output to target terminologies and combined it with ontology-grounded resources. We translated epidemiological definitions of confounder, collider, and mediator into queries for searching the KG and summarized the roles played by the identified variables. We compared the results with output from a complementary method and published observational studies and examined a selection of confounding and combined role variables in-depth.

RESULTS

Our search identified 128 confounders, including 58 phenotypes, 47 drugs, 35 genes, 23 collider, and 16 mediator phenotypes. However, only 31 of the 58 confounder phenotypes were found to behave exclusively as confounders, while the remaining 27 phenotypes played other roles. Obstructive sleep apnea emerged as a potential novel confounder for depression and AD. Anemia exemplified a variable playing combined roles.

CONCLUSION

Our findings suggest combining machine reading and KG could augment human expertise for causal feature selection. However, the complexity of causal feature selection for depression with AD highlights the need for standardized field-specific databases of causal variables. Further work is needed to optimize KG search and transform the output for human consumption.

摘要

背景

因果特征选择对于从观察性数据中估计效果至关重要。识别混杂因素是这一过程中的关键步骤。传统上，研究人员利用专业知识和文献综述来识别混杂因素。未被识别的混杂因素会导致无效性，对中间变量（中介）进行条件处理会削弱估计值，对常见效应（共发）进行条件处理会产生偏差。此外，如果不对同时具有多种作用的变量进行特殊处理，则错误的条件处理会引入偏差。然而，庞大的文献呈指数级增长，使得吸收这些知识变得不可行。为了解决这些挑战，我们引入了一种新的知识图谱（KG）应用，通过将可计算的文献衍生知识与生物医学本体相结合，实现因果特征选择。我们提出了一个应用案例，指定了一个因果模型，用于从观察性数据中估计抑郁对阿尔茨海默病（AD）发病风险的总因果效应。

方法

我们使用三个机器阅读系统从文献语料库中提取可计算知识，并使用逻辑闭包操作推断缺失知识。使用 KG 框架，我们将输出映射到目标术语，并将其与本体基础资源相结合。我们将混杂因素、共发和中介的流行病学定义转换为搜索 KG 的查询，并总结了所识别变量的作用。我们将结果与互补方法和已发表的观察性研究的输出进行了比较，并深入研究了选择的混杂因素和同时具有多种作用的变量。

结果

我们的搜索确定了 128 个混杂因素，包括 58 个表型、47 种药物、35 个基因、23 个共发和 16 个中介表型。然而，只有 58 个混杂因素表型中的 31 个被发现仅作为混杂因素起作用，而其余 27 个表型则起其他作用。阻塞性睡眠呼吸暂停症（obstructive sleep apnea）成为抑郁和 AD 的一个潜在新混杂因素。贫血则是一个同时具有多种作用的变量的例子。

结论

我们的研究结果表明，结合机器阅读和 KG 可以增强人类在因果特征选择方面的专业知识。然而，AD 与抑郁相关的因果特征选择的复杂性突出表明需要针对特定领域的因果变量建立标准化数据库。需要进一步工作来优化 KG 搜索并将输出转化为人类可接受的形式。

相似文献

Causal feature selection using a knowledge graph combining structured knowledge from the biomedical literature and ontologies: A use case studying depression as a risk factor for Alzheimer's disease.

J Biomed Inform. 2023 Jun;142:104368. doi: 10.1016/j.jbi.2023.104368. Epub 2023 Apr 21.

Using computable knowledge mined from the literature to elucidate confounders for EHR-based pharmacovigilance.

J Biomed Inform. 2021 May;117:103719. doi: 10.1016/j.jbi.2021.103719. Epub 2021 Mar 11.

Causal Knowledge as a Prerequisite for Interrogating Bias: Reflections on Hernán et al. 20 Years Later.

Am J Epidemiol. 2023 Nov 3;192(11):1797-1800. doi: 10.1093/aje/kwab274.

Educational Note: Paradoxical collider effect in the analysis of non-communicable disease epidemiological data: a reproducible illustration and web application.

Int J Epidemiol. 2019 Apr 1;48(2):640-653. doi: 10.1093/ije/dyy275.

COVID-19 and the epistemology of epidemiological models at the dawn of AI.

Ann Hum Biol. 2020 Sep;47(6):506-513. doi: 10.1080/03014460.2020.1839132.

Folic acid supplementation and malaria susceptibility and severity among people taking antifolate antimalarial drugs in endemic areas.

Cochrane Database Syst Rev. 2022 Feb 1;2(2022):CD014217. doi: 10.1002/14651858.CD014217.

Can statistical adjustment guided by causal inference improve the accuracy of effect estimation? A simulation and empirical research based on meta-analyses of case-control studies.

BMC Med Inform Decis Mak. 2020 Dec 11;20(1):333. doi: 10.1186/s12911-020-01343-3.

Bagged random causal networks for interventional queries on observational biomedical datasets.

J Biomed Inform. 2021 Mar;115:103689. doi: 10.1016/j.jbi.2021.103689. Epub 2021 Feb 4.

Using Causal Diagrams to Improve the Design and Interpretation of Medical Research.

Chest. 2020 Jul;158(1S):S21-S28. doi: 10.1016/j.chest.2020.03.011.

Causal diagrams for disease latency bias.

Int J Epidemiol. 2024 Aug 14;53(5). doi: 10.1093/ije/dyae111.

引用本文的文献

A Unified Framework for Alzheimer's Disease Knowledge Graphs: Architectures, Principles, and Clinical Translation.

Brain Sci. 2025 May 19;15(5):523. doi: 10.3390/brainsci15050523.

Knowledge graph and its application in the study of neurological and mental disorders.

Front Psychiatry. 2025 Mar 18;16:1452557. doi: 10.3389/fpsyt.2025.1452557. eCollection 2025.

Development and evaluation of a 4M taxonomy from nursing home staff text messages using a fine-tuned generative language model.

J Am Med Inform Assoc. 2025 Mar 1;32(3):535-544. doi: 10.1093/jamia/ocaf006.

A review of feature selection strategies utilizing graph data structures and Knowledge Graphs.

Brief Bioinform. 2024 Sep 23;25(6). doi: 10.1093/bib/bbae521.

Automated information extraction model enhancing traditional Chinese medicine RCT evidence extraction (Evi-BERT): algorithm development and validation.

Front Artif Intell. 2024 Aug 15;7:1454945. doi: 10.3389/frai.2024.1454945. eCollection 2024.

A Machine Learning Prediction Model of Adult Obstructive Sleep Apnea Based on Systematically Evaluated Common Clinical Biochemical Indicators.

Nat Sci Sleep. 2024 Apr 24;16:413-428. doi: 10.2147/NSS.S453794. eCollection 2024.

An open source knowledge graph ecosystem for the life sciences.

Sci Data. 2024 Apr 11;11(1):363. doi: 10.1038/s41597-024-03171-w.

本文引用的文献

Without Commitment to an Ontology, There Could Be No Causal Inference.

Epidemiology. 2022 May 1;33(3):372-378. doi: 10.1097/EDE.0000000000001471.

An introduction to inverse probability of treatment weighting in observational research.

Clin Kidney J. 2021 Aug 26;15(1):14-20. doi: 10.1093/ckj/sfab158. eCollection 2022 Jan.

Antiherpetic drugs: a potential way to prevent Alzheimer's disease?

Alzheimers Res Ther. 2022 Jan 7;14(1):3. doi: 10.1186/s13195-021-00950-0.

The confounder matrix: A tool to assess confounding bias in systematic reviews of observational studies of etiology.

Res Synth Methods. 2022 Mar;13(2):242-254. doi: 10.1002/jrsm.1544. Epub 2022 Jan 5.

Are Greenland, Ioannidis and Poole opposed to the Cornfield conditions? A defence of the E-value.

Int J Epidemiol. 2022 May 9;51(2):364-371. doi: 10.1093/ije/dyab218.

Global prevalence and burden of depressive and anxiety disorders in 204 countries and territories in 2020 due to the COVID-19 pandemic.

Lancet. 2021 Nov 6;398(10312):1700-1712. doi: 10.1016/S0140-6736(21)02143-7. Epub 2021 Oct 8.

Effect of a Combined Exercise and Cognitive Activity Intervention on Cognitive Function in Community-dwelling Older Adults: A Pilot Randomized Controlled Trial.

Phys Ther Res. 2021 Feb 24;24(2):112-119. doi: 10.1298/ptr.E10057. eCollection 2021.

The change in estimate method for selecting confounders: A simulation study.

Stat Methods Med Res. 2021 Sep;30(9):2032-2044. doi: 10.1177/09622802211034219. Epub 2021 Aug 9.

The Elusive 'White Whale' of Treatment Response Prediction: Leveraging the Curse of Heterogeneity in Late-Life Depression.

Am J Geriatr Psychiatry. 2021 Dec;29(12):1199-1201. doi: 10.1016/j.jagp.2021.04.001. Epub 2021 Apr 9.

Pro-inflammatory interleukin-6 signaling links cognitive impairments and peripheral metabolic alterations in Alzheimer's disease.

Transl Psychiatry. 2021 Apr 28;11(1):251. doi: 10.1038/s41398-021-01349-z.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

使用结合生物医学文献和本体结构化知识的知识图进行因果特征选择：以抑郁症作为阿尔茨海默病风险因素为例的研究。

Causal feature selection using a knowledge graph combining structured knowledge from the biomedical literature and ontologies: A use case studying depression as a risk factor for Alzheimer's disease.

机构信息

出版信息

BACKGROUND

METHODS

RESULTS

CONCLUSION

背景

方法

结果

结论

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献