Department of Computer Science, Sogang University, 35, Baekbeom-Ro, Mapo-Gu, Seoul, Korea.
VAIV Company Inc, 97, Dokseodang-Ro, Yongsan-Gu, Seoul, Korea.
BMC Bioinformatics. 2024 Aug 21;25(1):273. doi: 10.1186/s12859-024-05903-6.
There has been a considerable advancement in AI technologies like LLM and machine learning to support biomedical knowledge discovery.
We propose a novel biomedical neural search service called 'VAIV Bio-Discovery', which supports enhanced knowledge discovery and document search on unstructured text such as PubMed. It mainly handles with information related to chemical compound/drugs, gene/proteins, diseases, and their interactions (chemical compounds/drugs-proteins/gene including drugs-targets, drug-drug, and drug-disease). To provide comprehensive knowledge, the system offers four search options: basic search, entity and interaction search, and natural language search. We employ T5slim_dec, which adapts the autoregressive generation task of the T5 (text-to-text transfer transformer) to the interaction extraction task by removing the self-attention layer in the decoder block. It also assists in interpreting research findings by summarizing the retrieved search results for a given natural language query with Retrieval Augmented Generation (RAG). The search engine is built with a hybrid method that combines neural search with the probabilistic search, BM25.
As a result, our system can better understand the context, semantics and relationships between terms within the document, enhancing search accuracy. This research contributes to the rapidly evolving biomedical field by introducing a new service to access and discover relevant knowledge.
人工智能技术(如大语言模型和机器学习)在支持生物医学知识发现方面取得了相当大的进展。
我们提出了一种名为“VAIV Bio-Discovery”的新型生物医学神经搜索服务,它支持在非结构化文本(如 PubMed)上进行增强型知识发现和文档搜索。它主要处理与化合物/药物、基因/蛋白质、疾病及其相互作用(包括药物靶点、药物-药物和药物-疾病的化合物/药物-蛋白质/基因)相关的信息。为了提供全面的知识,该系统提供了四种搜索选项:基本搜索、实体和交互搜索以及自然语言搜索。我们使用 T5slim_dec,它通过在解码器块中删除自注意力层,将 T5(文本到文本转移转换器)的自动回归生成任务适应到交互提取任务。它还通过对给定自然语言查询的检索结果进行总结,利用检索增强生成(RAG)来帮助解释研究结果。该搜索引擎采用了一种混合方法,将神经搜索与概率搜索 BM25 相结合。
因此,我们的系统可以更好地理解文档中术语之间的上下文、语义和关系,从而提高搜索的准确性。这项研究通过引入一种新的服务来访问和发现相关知识,为快速发展的生物医学领域做出了贡献。