精准医学的前沿证据检索工具：算法开发与验证

State-of-the-Art Evidence Retriever for Precision Medicine: Algorithm Development and Validation.

作者信息

Jin Qiao, Tan Chuanqi, Chen Mosha, Yan Ming, Zhang Ningyu, Huang Songfang, Liu Xiaozhong

机构信息

Alibaba Group, Hangzhou, China.

Zhejiang University, Zhejiang, China.

出版信息

JMIR Med Inform. 2022 Dec 15;10(12):e40743. doi: 10.2196/40743.

DOI:10.2196/40743

PMID:36409468

原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9801267/

Abstract

BACKGROUND

Under the paradigm of precision medicine (PM), patients with the same disease can receive different personalized therapies according to their clinical and genetic features. These therapies are determined by the totality of all available clinical evidence, including results from case reports, clinical trials, and systematic reviews. However, it is increasingly difficult for physicians to find such evidence from scientific publications, whose size is growing at an unprecedented pace.

OBJECTIVE

In this work, we propose the PM-Search system to facilitate the retrieval of clinical literature that contains critical evidence for or against giving specific therapies to certain cancer patients.

METHODS

The PM-Search system combines a baseline retriever that selects document candidates at a large scale and an evidence reranker that finely reorders the candidates based on their evidence quality. The baseline retriever uses query expansion and keyword matching with the ElasticSearch retrieval engine, and the evidence reranker fits pretrained language models to expert annotations that are derived from an active learning strategy.

RESULTS

The PM-Search system achieved the best performance in the retrieval of high-quality clinical evidence at the Text Retrieval Conference PM Track 2020, outperforming the second-ranking systems by large margins (0.4780 vs 0.4238 for standard normalized discounted cumulative gain at rank 30 and 0.4519 vs 0.4193 for exponential normalized discounted cumulative gain at rank 30).

CONCLUSIONS

We present PM-Search, a state-of-the-art search engine to assist the practicing of evidence-based PM. PM-Search uses a novel Bidirectional Encoder Representations from Transformers for Biomedical Text Mining-based active learning strategy that models evidence quality and improves the model performance. Our analyses show that evidence quality is a distinct aspect from general relevance, and specific modeling of evidence quality beyond general relevance is required for a PM search engine.

摘要

背景

在精准医学（PM）范式下，患有相同疾病的患者可根据其临床和基因特征接受不同的个性化治疗。这些治疗方案由所有可用临床证据的总体情况决定，包括病例报告、临床试验和系统评价的结果。然而，医生越来越难以从科学出版物中找到此类证据，因为其数量正以前所未有的速度增长。

目的

在这项工作中，我们提出了PM-Search系统，以促进临床文献的检索，这些文献包含支持或反对为某些癌症患者提供特定治疗的关键证据。

方法

PM-Search系统结合了一个基线检索器和一个证据重排器，基线检索器大规模选择候选文档，证据重排器根据候选文档的证据质量对其进行精细重新排序。基线检索器使用查询扩展和与ElasticSearch检索引擎的关键词匹配，证据重排器将预训练语言模型与从主动学习策略得出的专家注释进行拟合。

结果

在2020年文本检索会议PM赛道的高质量临床证据检索中，PM-Search系统取得了最佳性能，大幅领先于排名第二的系统（排名30时标准归一化折损累计增益分别为0.4780对0.4238，排名30时指数归一化折损累计增益分别为0.4519对0.4193）。

结论

我们展示了PM-Search，这是一个最先进的搜索引擎，可协助基于证据的精准医学实践。PM-Search使用了一种新颖的基于生物医学文本挖掘的Transformer双向编码器表示主动学习策略，该策略对证据质量进行建模并提高了模型性能。我们的分析表明，证据质量是与一般相关性不同的一个方面，精准医学搜索引擎需要对证据质量进行超越一般相关性的特定建模。