使用图查询搜索 COVID-19 临床研究:算法开发与验证。

Searching COVID-19 Clinical Research Using Graph Queries: Algorithm Development and Validation.

机构信息

Department of Electronics, Information, and Bioengineering, Politecnico di Milano, Milan, Italy.

出版信息

J Med Internet Res. 2024 May 30;26:e52655. doi: 10.2196/52655.

Abstract

BACKGROUND

Since the beginning of the COVID-19 pandemic, >1 million studies have been collected within the COVID-19 Open Research Dataset, a corpus of manuscripts created to accelerate research against the disease. Their related abstracts hold a wealth of information that remains largely unexplored and difficult to search due to its unstructured nature. Keyword-based search is the standard approach, which allows users to retrieve the documents of a corpus that contain (all or some of) the words in a target list. This type of search, however, does not provide visual support to the task and is not suited to expressing complex queries or compensating for missing specifications.

OBJECTIVE

This study aims to consider small graphs of concepts and exploit them for expressing graph searches over existing COVID-19-related literature, leveraging the increasing use of graphs to represent and query scientific knowledge and providing a user-friendly search and exploration experience.

METHODS

We considered the COVID-19 Open Research Dataset corpus and summarized its content by annotating the publications' abstracts using terms selected from the Unified Medical Language System and the Ontology of Coronavirus Infectious Disease. Then, we built a co-occurrence network that includes all relevant concepts mentioned in the corpus, establishing connections when their mutual information is relevant. A sophisticated graph query engine was built to allow the identification of the best matches of graph queries on the network. It also supports partial matches and suggests potential query completions using shortest paths.

RESULTS

We built a large co-occurrence network, consisting of 128,249 entities and 47,198,965 relationships; the GRAPH-SEARCH interface allows users to explore the network by formulating or adapting graph queries; it produces a bibliography of publications, which are globally ranked; and each publication is further associated with the specific parts of the query that it explains, thereby allowing the user to understand each aspect of the matching.

CONCLUSIONS

Our approach supports the process of query formulation and evidence search upon a large text corpus; it can be reapplied to any scientific domain where documents corpora and curated ontologies are made available.

摘要

背景

自 COVID-19 大流行开始以来,已经在 COVID-19 开放研究数据集(一个由专门为加速对抗该疾病的研究而创建的手稿组成的语料库)中收集了超过 100 万项研究。它们相关的摘要包含大量信息,但由于其非结构化的性质,这些信息在很大程度上尚未得到探索,并且难以搜索。基于关键字的搜索是标准方法,它允许用户检索包含目标列表中的(全部或部分)单词的语料库中的文档。然而,这种类型的搜索没有为任务提供视觉支持,并且不适合表达复杂的查询或弥补缺失的规范。

目的

本研究旨在考虑概念的小图,并利用它们来表达对现有 COVID-19 相关文献的图形搜索,利用越来越多地使用图形来表示和查询科学知识,并提供用户友好的搜索和探索体验。

方法

我们考虑了 COVID-19 开放研究数据集语料库,并通过使用来自统一医学语言系统和冠状病毒传染病本体的术语对出版物的摘要进行注释来总结其内容。然后,我们构建了一个共现网络,其中包含语料库中提到的所有相关概念,当它们的互信息相关时建立连接。构建了一个复杂的图形查询引擎,允许在网络上识别图形查询的最佳匹配。它还支持部分匹配,并使用最短路径建议潜在的查询完成。

结果

我们构建了一个大型共现网络,包含 128249 个实体和 47198965 个关系;GRAPH-SEARCH 界面允许用户通过制定或调整图形查询来探索网络;它生成出版物的参考书目,这些书目进行全局排名;并且每个出版物都进一步与它解释的查询的特定部分相关联,从而允许用户理解匹配的每个方面。

结论

我们的方法支持在大型文本语料库上进行查询制定和证据搜索的过程;它可以应用于任何提供文档语料库和精心策划的本体的科学领域。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/7f1f/11176882/a37c0e4d32bc/jmir_v26i1e52655_fig1.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索