Suppr超能文献

一个用于人工智能驱动的数据驱动型生物医学研究的综合性大规模生物医学知识图谱。

A comprehensive large scale biomedical knowledge graph for AI powered data driven biomedical research.

作者信息

Zhang Yuan, Sui Xin, Pan Feng, Yu Kaixian, Li Keqiao, Tian Shubo, Erdengasileng Arslan, Han Qing, Wang Wanjing, Wang Jianan, Wang Jian, Sun Donghu, Chung Henry, Zhou Jun, Zhou Eric, Lee Ben, Zhang Peili, Qiu Xing, Zhao Tingting, Zhang Jinfeng

机构信息

Department of Statistics, Florida State University, Tallahassee, FL 32306.

Insilicom LLC, Tallahassee, FL 32303.

出版信息

bioRxiv. 2025 Mar 4:2023.10.13.562216. doi: 10.1101/2023.10.13.562216.

Abstract

To address the rapid growth of scientific publications and data in biomedical research, knowledge graphs (KGs) have become a critical tool for integrating large volumes of heterogeneous data to enable efficient information retrieval and automated knowledge discovery (AKD). However, transforming unstructured scientific literature into KGs remains a significant challenge, with previous methods unable to achieve human-level accuracy. In this study, we utilized an information extraction pipeline that won first place in the LitCoin NLP Challenge (2022) to construct a large-scale KG named iKraph using all PubMed abstracts. The extracted information matches human expert annotations and significantly exceeds the content of manually curated public databases. To enhance the KG's comprehensiveness, we integrated relation data from 40 public databases and relation information inferred from high-throughput genomics data. This KG facilitates rigorous performance evaluation of AKD, which was infeasible in previous studies. We designed an interpretable, probabilistic-based inference method to identify indirect causal relations and applied it to real-time COVID-19 drug repurposing from March 2020 to May 2023. Our method identified 600-1400 candidate drugs per month, with one-third of those discovered in the first two months later supported by clinical trials or PubMed publications. These outcomes are very challenging to attain through alternative approaches that lack a thorough understanding of the existing literature. A cloud-based platform (https://biokde.insilicom.com) was developed for academic users to access this rich structured data and associated tools.

摘要

为应对生物医学研究中科学出版物和数据的快速增长,知识图谱(KGs)已成为整合大量异构数据以实现高效信息检索和自动知识发现(AKD)的关键工具。然而,将非结构化的科学文献转化为知识图谱仍然是一项重大挑战,以前的方法无法达到人类水平的准确性。在本研究中,我们利用在LitCoin NLP挑战赛(2022年)中获得第一名的信息提取管道,使用所有PubMed摘要构建了一个名为iKraph的大规模知识图谱。提取的信息与人类专家注释相匹配,并且显著超过了人工策划公共数据库的内容。为了提高知识图谱的全面性,我们整合了来自40个公共数据库的关系数据以及从高通量基因组学数据推断出的关系信息。这个知识图谱有助于对AKD进行严格的性能评估,而这在以前的研究中是不可行的。我们设计了一种基于概率的可解释推理方法来识别间接因果关系,并将其应用于2020年3月至2023年5月的实时COVID-19药物重新利用。我们的方法每月识别出600 - 1400种候选药物,其中在前两个月发现的药物中有三分之一后来得到了临床试验或PubMed出版物的支持。通过缺乏对现有文献深入理解的替代方法很难获得这些成果。我们为学术用户开发了一个基于云的平台(https://biokde.insilicom.com),以访问这些丰富的结构化数据和相关工具。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5984/11887766/c0c70565a055/nihpp-2023.10.13.562216v3-f0001.jpg

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验