Suppr超能文献

串行儿童知识挖掘器(SKiM)使用共现和变压器模型发现并注释生物医学知识。

Serial KinderMiner (SKiM) Discovers and Annotates Biomedical Knowledge Using Co-Occurrence and Transformer Models.

作者信息

Millikin Robert J, Raja Kalpana, Steill John, Lock Cannon, Tu Xuancheng, Ross Ian, Tsoi Lam C, Kuusisto Finn, Ni Zijian, Livny Miron, Bockelman Brian, Thomson James, Stewart Ron

出版信息

bioRxiv. 2023 Jun 1:2023.05.30.542911. doi: 10.1101/2023.05.30.542911.

Abstract

BACKGROUND

The PubMed database contains more than 34 million articles; consequently, it is becoming increasingly difficult for a biomedical researcher to keep up-to-date with different knowledge domains. Computationally efficient and interpretable tools are needed to help researchers find and understand associations between biomedical concepts. The goal of literature-based discovery (LBD) is to connect concepts in isolated literature domains that would normally go undiscovered. This usually takes the form of an A-B-C relationship, where A and C terms are linked through a B term intermediate. Here we describe Serial KinderMiner (SKiM), an LBD algorithm for finding statistically significant links between an A term and one or more C terms through some B term intermediate(s). The development of SKiM is motivated by the the observation that there are only a few LBD tools that provide a functional web interface, and that the available tools are limited in one or more of the following ways: 1) they identify a relationship but not the type of relationship, 2) they do not allow the user to provide their own lists of B or C terms, hindering flexibility, 3) they do not allow for querying thousands of C terms (which is crucial if, for instance, the user wants to query connections between a disease and the thousands of available drugs), or 4) they are specific for a particular biomedical domain (such as cancer). We provide an open-source tool and web interface that improves on all of these issues.

RESULTS

We demonstrate SKiM's ability to discover useful A-B-C linkages in three control experiments: classic LBD discoveries, drug repurposing, and finding associations related to cancer. Furthermore, we supplement SKiM with a knowledge graph built with transformer machine-learning models to aid in interpreting the relationships between terms found by SKiM. Finally, we provide a simple and intuitive open-source web interface ( https://skim.morgridge.org ) with comprehensive lists of drugs, diseases, phenotypes, and symptoms so that anyone can easily perform SKiM searches.

CONCLUSIONS

SKiM is a simple algorithm that can perform LBD searches to discover relationships between arbitrary user-defined concepts. SKiM is generalized for any domain, can perform searches with many thousands of C term concepts, and moves beyond the simple identification of an existence of a relationship; many relationships are given relationship type labels from our knowledge graph.

摘要

背景

PubMed数据库包含超过3400万篇文章;因此,生物医学研究人员要跟上不同知识领域的最新进展变得越来越困难。需要计算效率高且可解释的工具来帮助研究人员发现并理解生物医学概念之间的关联。基于文献的发现(LBD)的目标是连接通常未被发现的孤立文献领域中的概念。这通常采取A-B-C关系的形式,其中A和C术语通过B术语中间体相连。在这里,我们描述了串行KinderMiner(SKiM),一种用于通过某些B术语中间体在A术语与一个或多个C术语之间找到具有统计学意义联系的LBD算法。SKiM的开发动机在于观察到只有少数LBD工具提供功能性网络界面,并且现有工具在以下一个或多个方面存在局限性:1)它们识别关系但不识别关系类型;2)它们不允许用户提供自己的B或C术语列表,从而阻碍了灵活性;3)它们不允许查询数千个C术语(例如,如果用户想要查询一种疾病与数千种可用药物之间的联系,这一点至关重要);4)它们特定于某个特定的生物医学领域(如癌症)。我们提供了一个开源工具和网络界面,改进了所有这些问题。

结果

我们在三个对照实验中展示了SKiM发现有用的A-B-C联系的能力:经典LBD发现、药物重新利用以及发现与癌症相关的关联。此外,我们用基于变压器机器学习模型构建的知识图谱对SKiM进行补充,以帮助解释SKiM发现的术语之间的关系。最后,我们提供了一个简单直观的开源网络界面(https://skim.morgridge.org),其中包含药物、疾病、表型和症状的综合列表,以便任何人都能轻松进行SKiM搜索。

结论

SKiM是一种简单的算法,可以执行LBD搜索以发现任意用户定义概念之间的关系。SKiM适用于任何领域,可以使用数千个C术语概念进行搜索,并且超越了简单地识别关系的存在;许多关系从我们的知识图谱中获得关系类型标签。

相似文献

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验