Suppr超能文献

基于检索的临床决策支持系统的大型患者摘要数据集。

A large-scale dataset of patient summaries for retrieval-based clinical decision support systems.

机构信息

Center for Statistical Science, Tsinghua University, Beijing, 100084, China.

School of Medicine, Tsinghua University, Beijing, 100084, China.

出版信息

Sci Data. 2023 Dec 18;10(1):909. doi: 10.1038/s41597-023-02814-8.

Abstract

Retrieval-based Clinical Decision Support (ReCDS) can aid clinical workflow by providing relevant literature and similar patients for a given patient. However, the development of ReCDS systems has been severely obstructed by the lack of diverse patient collections and publicly available large-scale patient-level annotation datasets. In this paper, we collect a novel dataset of patient summaries and relations called PMC-Patients to benchmark two ReCDS tasks: Patient-to-Article Retrieval (ReCDS-PAR) and Patient-to-Patient Retrieval (ReCDS-PPR). Specifically, we extract patient summaries from PubMed Central articles using simple heuristics and utilize the PubMed citation graph to define patient-article relevance and patient-patient similarity. PMC-Patients contains 167k patient summaries with 3.1 M patient-article relevance annotations and 293k patient-patient similarity annotations, which is the largest-scale resource for ReCDS and also one of the largest patient collections. Human evaluation and analysis show that PMC-Patients is a diverse dataset with high-quality annotations. We also implement and evaluate several ReCDS systems on the PMC-Patients benchmarks to show its challenges and conduct several case studies to show the clinical utility of PMC-Patients.

摘要

基于检索的临床决策支持(ReCDS)可以通过为给定患者提供相关文献和相似患者来辅助临床工作流程。然而,由于缺乏多样化的患者群体和公开的大规模患者级注释数据集,ReCDS 系统的开发受到了严重阻碍。在本文中,我们收集了一个名为 PMC-Patients 的新的患者摘要和关系数据集,用于基准测试两个 ReCDS 任务:患者到文章检索(ReCDS-PAR)和患者到患者检索(ReCDS-PPR)。具体来说,我们使用简单的启发式方法从 PubMed Central 文章中提取患者摘要,并利用 PubMed 引文图来定义患者-文章相关性和患者-患者相似性。PMC-Patients 包含 167k 个患者摘要,有 3.1M 个患者-文章相关性注释和 293k 个患者-患者相似性注释,这是最大规模的 ReCDS 资源,也是最大的患者群体之一。人工评估和分析表明,PMC-Patients 是一个具有高质量注释的多样化数据集。我们还在 PMC-Patients 基准上实现和评估了几个 ReCDS 系统,以展示其挑战,并进行了几个案例研究,以展示 PMC-Patients 的临床实用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a67/10728216/a2c78f038109/41597_2023_2814_Fig1_HTML.jpg

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验