Suppr超能文献

KGG:一种用于创建疾病特定知识图谱的全自动工作流程。

KGG: a fully automated workflow for creating disease-specific knowledge graphs.

作者信息

Karki Reagon, Gadiya Yojana, Zaliani Andrea, Pokharel Bishab, Babaiha Negin Sadat, Ostaszewski Marek, Hofmann-Apitius Martin, Gribbon Philip

机构信息

Fraunhofer Institute for Translational Medicine and Pharmacology (ITMP), Hamburg, 22525, Germany.

Fraunhofer Cluster of Excellence for Immune-Mediated Diseases (CIMD), Frankfurt, 60590, Germany.

出版信息

Bioinformatics. 2025 Jul 1;41(7). doi: 10.1093/bioinformatics/btaf383.

Abstract

MOTIVATION

Knowledge graphs (KGs) in life sciences have become an important application of systems biology as they delineate complex biological and pathophysiological phenomena. They are composed of biological and chemical entities represented with standard ontologies to comply with Findable, Accessible, Interoperable and Reusable (FAIR) principles. Alongside serving as a graph database, KGs hold the potential to address complex scientific queries and facilitate downstream analyses. However, the process of constructing KGs is expensive and time consuming as it primarily relies on manual curation from published literature and experimental data. The existing text-mining workflows are still in their infancy and fail to achieve the accuracy and reliability of manual curation.

RESULTS

Knowledge graph generator (KGG) is an automated workflow for representing chemotype and phenotype of diseases and medical conditions. It embeds the underlying schema of curated databases such as OpenTargets, Uniprot, ChEMBL, Integrated Interactions Database and GWAS Central resembling a clockwork-esque mechanism. The resultant KG is a comprehensive and rational assembly of disease-associated entities such as proteins, protein-related pathways, biological processes and functions, genetic variants, chemicals, mechanism of actions, assays and adverse effects. As use cases, we have used KGs to identify shared entities for possible link of comorbidity and compared them with KGs from other sources. We have also demonstrated a use case of identifying putative new targets and repurposing drug candidates in Parkinson's Disease. Lastly, we have developed reusable workflows to explore drug-likeness of chemicals and identify structures of proteins.

AVAILABILITY AND IMPLEMENTATION

The resources and codes for KGG are publicly available at: https://github.com/Fraunhofer-ITMP/kgg.

摘要

动机

生命科学中的知识图谱(KGs)已成为系统生物学的一项重要应用,因为它们描绘了复杂的生物学和病理生理现象。它们由用标准本体表示的生物和化学实体组成,以符合可查找、可访问、可互操作和可重用(FAIR)原则。除了作为图形数据库外,知识图谱还具有解决复杂科学问题和促进下游分析的潜力。然而,构建知识图谱的过程既昂贵又耗时,因为它主要依赖于从已发表的文献和实验数据中进行人工编目。现有的文本挖掘工作流程仍处于起步阶段,无法达到人工编目的准确性和可靠性。

结果

知识图谱生成器(KGG)是一种用于表示疾病和医疗状况的化学型和表型的自动化工作流程。它嵌入了诸如OpenTargets、Uniprot、ChEMBL、综合相互作用数据库和GWAS Central等经过整理的数据库的底层模式,类似于一种发条式机制。生成的知识图谱是疾病相关实体的全面且合理组合,这些实体包括蛋白质、与蛋白质相关的途径、生物过程和功能、基因变异、化学物质、作用机制、检测方法和不良反应。作为用例,我们使用知识图谱来识别可能的共病关联的共享实体,并将它们与其他来源的知识图谱进行比较。我们还展示了一个在帕金森病中识别潜在新靶点和重新利用候选药物的用例。最后,我们开发了可重复使用的工作流程来探索化学物质的类药性并识别蛋白质的结构。

可用性和实现

KGG的资源和代码可在以下网址公开获取:https://github.com/Fraunhofer-ITMP/kgg

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8b51/12270262/325af12f5c60/btaf383f1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验