Suppr超能文献

生物医学图形化平台:一个用于生成生物医学先验知识和组学信号通路图的一体化平台。

BioMedGraphica: An All-in-One Platform for Biomedical Prior Knowledge and Omic Signaling Graph Generation.

作者信息

Zhang Heming, Liang Shunning, Xu Tim, Li Wenyu, Huang Di, Dong Yuhan, Li Guangfu, Miller J Philip, Goedegebuure S Peter, Sardiello Marco, Cooper Jonathan, Buchser William, Dickson Patricia, Fields Ryan C, Cruchaga Carlos, Chen Yixin, Province Michael, Payne Philip, Li Fuhai

机构信息

Institute for Informatics, Data Science and Biostatistics (I2DB), Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA.

Department of Computer Science and Engineering, Washington University School of Medicine, Washington University in St. Louis, St. Louis, MO, USA.

出版信息

bioRxiv. 2024 Dec 9:2024.12.05.627020. doi: 10.1101/2024.12.05.627020.

Abstract

Artificial intelligence (AI) is revolutionizing scientific discovery because of its super capability, following the neural scaling laws, to integrate and analyze large-scale datasets to mine knowledge. Foundation models, large language models (LLMs) and large vision models (LVMs), are among the most important foundations paving the way for general AI by pre-training on massive domain-specific datasets. Different from the well annotated, formatted and integrated large textual and image datasets for LLMs and LVMs, biomedical knowledge and datasets are fragmented with data scattered across publications and inconsistent databases that often use diverse nomenclature systems in the field of AI for Precision Health and Medicine (AI4PHM). These discrepancies, spanning different levels of biomedical organization from genes to clinical traits, present major challenges for data integration and alignment. To facilitate foundation AI model development and applications in AI4PHM, herein, we developed , an all-in-one platform and unified text-attributed knowledge graph (TAKG), consists of 3,131,788 entities and 56,817,063 relations, which are obtained from 11 distinct entity types and harmonizes 29 relations/edge types using data from 43 biomedical databases. All entities and relations are labeled a unique ID and associated with textual descriptions (textual features). Since covers most of research entities in AI4PHM, BioMedGraphica supports the zero-shot or few-shot knowledge discoveries via new relation prediction on the graph. Via a graphical user interface (GUI), researchers can access the knowledge graph with prior knowledge of target functional annotations, drugs, phenotypes and diseases (drug-protein-disease-phenotype), in the graph AI ready format. It also supports the generation of knowledge-multi-omic signaling graphs to facilitate the development and applications of novel AI models, like LLMs, graph AI, for AI4PHM science discovery, like discovering novel disease pathogenesis, signaling pathways, therapeutic targets, drugs and synergistic cocktails.

摘要

人工智能(AI)正凭借其超强能力变革科学发现,它遵循神经缩放定律,能够整合和分析大规模数据集以挖掘知识。基础模型、大语言模型(LLMs)和大视觉模型(LVMs)是通过在大量特定领域数据集上进行预训练为通用人工智能铺平道路的最重要基础之一。与用于LLMs和LVMs的注释良好、格式规范且集成的大型文本和图像数据集不同,生物医学知识和数据集是碎片化的,数据分散在出版物和不一致的数据库中,这些数据库在人工智能精准健康与医学(AI4PHM)领域常常使用不同的命名系统。这些差异涵盖从基因到临床特征的生物医学组织的不同层面,给数据整合与对齐带来了重大挑战。为了促进基础人工智能模型在AI4PHM中的开发与应用,在此我们开发了BioMedGraphica,这是一个一体化平台和统一的文本属性知识图谱(TAKG),它由3,131,788个实体和56,817,063个关系组成,这些实体和关系来自11种不同的实体类型,并使用来自43个生物医学数据库的数据协调了29种关系/边类型。所有实体和关系都被标记了唯一ID,并与文本描述(文本特征)相关联。由于BioMedGraphica涵盖了AI4PHM中的大多数研究实体,它支持通过在图谱上进行新关系预测来实现零样本或少样本知识发现。通过图形用户界面(GUI),研究人员可以以图谱人工智能就绪格式访问具有目标功能注释、药物、表型和疾病(药物 - 蛋白质 - 疾病 - 表型)先验知识的知识图谱。它还支持生成知识 - 多组学信号图谱,以促进新型人工智能模型(如LLMs、图谱人工智能)在AI4PHM科学发现中的开发与应用,如发现新型疾病发病机制、信号通路、治疗靶点、药物和协同鸡尾酒组合。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/8866/11661111/d57d0bdb1418/nihpp-2024.12.05.627020v1-f0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验