• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于检索的临床决策支持系统的大型患者摘要数据集。

A large-scale dataset of patient summaries for retrieval-based clinical decision support systems.

机构信息

Center for Statistical Science, Tsinghua University, Beijing, 100084, China.

School of Medicine, Tsinghua University, Beijing, 100084, China.

出版信息

Sci Data. 2023 Dec 18;10(1):909. doi: 10.1038/s41597-023-02814-8.

DOI:10.1038/s41597-023-02814-8
PMID:38110415
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC10728216/
Abstract

Retrieval-based Clinical Decision Support (ReCDS) can aid clinical workflow by providing relevant literature and similar patients for a given patient. However, the development of ReCDS systems has been severely obstructed by the lack of diverse patient collections and publicly available large-scale patient-level annotation datasets. In this paper, we collect a novel dataset of patient summaries and relations called PMC-Patients to benchmark two ReCDS tasks: Patient-to-Article Retrieval (ReCDS-PAR) and Patient-to-Patient Retrieval (ReCDS-PPR). Specifically, we extract patient summaries from PubMed Central articles using simple heuristics and utilize the PubMed citation graph to define patient-article relevance and patient-patient similarity. PMC-Patients contains 167k patient summaries with 3.1 M patient-article relevance annotations and 293k patient-patient similarity annotations, which is the largest-scale resource for ReCDS and also one of the largest patient collections. Human evaluation and analysis show that PMC-Patients is a diverse dataset with high-quality annotations. We also implement and evaluate several ReCDS systems on the PMC-Patients benchmarks to show its challenges and conduct several case studies to show the clinical utility of PMC-Patients.

摘要

基于检索的临床决策支持(ReCDS)可以通过为给定患者提供相关文献和相似患者来辅助临床工作流程。然而,由于缺乏多样化的患者群体和公开的大规模患者级注释数据集,ReCDS 系统的开发受到了严重阻碍。在本文中,我们收集了一个名为 PMC-Patients 的新的患者摘要和关系数据集,用于基准测试两个 ReCDS 任务:患者到文章检索(ReCDS-PAR)和患者到患者检索(ReCDS-PPR)。具体来说,我们使用简单的启发式方法从 PubMed Central 文章中提取患者摘要,并利用 PubMed 引文图来定义患者-文章相关性和患者-患者相似性。PMC-Patients 包含 167k 个患者摘要,有 3.1M 个患者-文章相关性注释和 293k 个患者-患者相似性注释,这是最大规模的 ReCDS 资源,也是最大的患者群体之一。人工评估和分析表明,PMC-Patients 是一个具有高质量注释的多样化数据集。我们还在 PMC-Patients 基准上实现和评估了几个 ReCDS 系统,以展示其挑战,并进行了几个案例研究,以展示 PMC-Patients 的临床实用性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a67/10728216/712686db16c2/41597_2023_2814_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a67/10728216/a2c78f038109/41597_2023_2814_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a67/10728216/14c4fe27e69b/41597_2023_2814_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a67/10728216/5fc462cd68a5/41597_2023_2814_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a67/10728216/9d91c4e4eb6b/41597_2023_2814_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a67/10728216/b5ed963661df/41597_2023_2814_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a67/10728216/87ee16c6de26/41597_2023_2814_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a67/10728216/712686db16c2/41597_2023_2814_Fig7_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a67/10728216/a2c78f038109/41597_2023_2814_Fig1_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a67/10728216/14c4fe27e69b/41597_2023_2814_Fig2_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a67/10728216/5fc462cd68a5/41597_2023_2814_Fig3_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a67/10728216/9d91c4e4eb6b/41597_2023_2814_Fig4_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a67/10728216/b5ed963661df/41597_2023_2814_Fig5_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a67/10728216/87ee16c6de26/41597_2023_2814_Fig6_HTML.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/9a67/10728216/712686db16c2/41597_2023_2814_Fig7_HTML.jpg

相似文献

1
A large-scale dataset of patient summaries for retrieval-based clinical decision support systems.基于检索的临床决策支持系统的大型患者摘要数据集。
Sci Data. 2023 Dec 18;10(1):909. doi: 10.1038/s41597-023-02814-8.
2
Prescription of Controlled Substances: Benefits and Risks管制药品的处方:益处与风险
3
Genetic determinants of testicular sperm extraction outcomes: insights from a large multicentre study of men with non-obstructive azoospermia.睾丸精子提取结果的遗传决定因素:来自一项针对非梗阻性无精子症男性的大型多中心研究的见解
Hum Reprod Open. 2025 Aug 29;2025(3):hoaf049. doi: 10.1093/hropen/hoaf049. eCollection 2025.
4
Short-Term Memory Impairment短期记忆障碍
5
Interventions to improve safe and effective medicines use by consumers: an overview of systematic reviews.改善消费者安全有效用药的干预措施:系统评价概述
Cochrane Database Syst Rev. 2014 Apr 29;2014(4):CD007768. doi: 10.1002/14651858.CD007768.pub3.
6
Aspects of Genetic Diversity, Host Specificity and Public Health Significance of Single-Celled Intestinal Parasites Commonly Observed in Humans and Mostly Referred to as 'Non-Pathogenic'.人类常见且大多被称为“非致病性”的单细胞肠道寄生虫的遗传多样性、宿主特异性及公共卫生意义
APMIS. 2025 Sep;133(9):e70036. doi: 10.1111/apm.70036.
7
Systemic Inflammatory Response Syndrome全身炎症反应综合征
8
[Volume and health outcomes: evidence from systematic reviews and from evaluation of Italian hospital data].[容量与健康结果:来自系统评价和意大利医院数据评估的证据]
Epidemiol Prev. 2013 Mar-Jun;37(2-3 Suppl 2):1-100.
9
Comparison of Two Modern Survival Prediction Tools, SORG-MLA and METSSS, in Patients With Symptomatic Long-bone Metastases Who Underwent Local Treatment With Surgery Followed by Radiotherapy and With Radiotherapy Alone.两种现代生存预测工具 SORG-MLA 和 METSSS 在接受手术联合放疗和单纯放疗治疗有症状长骨转移患者中的比较。
Clin Orthop Relat Res. 2024 Dec 1;482(12):2193-2208. doi: 10.1097/CORR.0000000000003185. Epub 2024 Jul 23.
10
Plug-and-play use of tree-based methods: consequences for clinical prediction modeling.基于树的方法的即插即用:对临床预测模型的影响。
J Clin Epidemiol. 2025 Aug;184:111834. doi: 10.1016/j.jclinepi.2025.111834. Epub 2025 May 19.

引用本文的文献

1
Towards generalist foundation model for radiology by leveraging web-scale 2D&3D medical data.通过利用网络规模的二维和三维医学数据构建放射学通用基础模型。
Nat Commun. 2025 Aug 23;16(1):7866. doi: 10.1038/s41467-025-62385-7.
2
SynthEHR-Eviction: Enhancing Eviction SDoH Detection with LLM-Augmented Synthetic EHR Data.合成电子健康记录-驱逐:利用大语言模型增强的合成电子健康记录数据改进驱逐相关健康社会决定因素的检测
medRxiv. 2025 Jul 14:2025.07.10.25331237. doi: 10.1101/2025.07.10.25331237.
3
Recommending Clinical Trials for Online Patient Cases using Artificial Intelligence.

本文引用的文献

1
MedCPT: Contrastive Pre-trained Transformers with large-scale PubMed search logs for zero-shot biomedical information retrieval.MedCPT:利用大规模 PubMed 检索日志进行零样本生物医学信息检索的对比预训练 Transformer。
Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad651.
2
MIMIC-IV, a freely accessible electronic health record dataset.MIMIC-IV,一个可自由访问的电子健康记录数据集。
Sci Data. 2023 Jan 3;10(1):1. doi: 10.1038/s41597-022-01899-x.
3
Case Report: Pathogenic c.5797delC Mutation in a Patient With Apparent Thrombocytopenia and Nephropathy.
使用人工智能为在线患者病例推荐临床试验。
ArXiv. 2025 Apr 15:arXiv:2504.20059v1.
4
Preliminary analysis of the impact of lab results on large language model generated differential diagnoses.实验室结果对大语言模型生成的鉴别诊断影响的初步分析
NPJ Digit Med. 2025 Mar 18;8(1):166. doi: 10.1038/s41746-025-01556-8.
5
Humans and Large Language Models in Clinical Decision Support: A Study with Medical Calculators.临床决策支持中的人类与大语言模型:一项针对医学计算器的研究
ArXiv. 2025 Mar 21:arXiv:2411.05897v2.
6
Adversarial Attacks on Large Language Models in Medicine.医学领域对大语言模型的对抗攻击。
ArXiv. 2024 Dec 16:arXiv:2406.12259v3.
7
Unmasking and quantifying racial bias of large language models in medical report generation.揭示并量化大语言模型在医学报告生成中的种族偏见。
Commun Med (Lond). 2024 Sep 10;4(1):176. doi: 10.1038/s43856-024-00601-z.
8
Retrieval-Based Diagnostic Decision Support: Mixed Methods Study.基于检索的诊断决策支持:混合方法研究。
JMIR Med Inform. 2024 Jun 19;12:e50209. doi: 10.2196/50209.
9
CoRTEx: contrastive learning for representing terms via explanations with applications on constructing biomedical knowledge graphs.CoRTEx:通过解释进行术语表示的对比学习及其在构建生物医学知识图谱中的应用。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1912-1920. doi: 10.1093/jamia/ocae115.
10
Unmasking and Quantifying Racial Bias of Large Language Models in Medical Report Generation.揭示并量化大型语言模型在医学报告生成中的种族偏见
ArXiv. 2024 Jan 25:arXiv:2401.13867v1.
病例报告:一名明显血小板减少症和肾病患者的致病性c.5797delC突变
Front Genet. 2021 Jul 28;12:705832. doi: 10.3389/fgene.2021.705832. eCollection 2021.
4
Information Retrieval in an Infodemic: The Case of COVID-19 Publications.信息疫情中的信息检索:以新冠疫情相关出版物为例
J Med Internet Res. 2021 Sep 17;23(9):e30161. doi: 10.2196/30161.
5
Biliary Peritonitis Caused by Spontaneous Bile Duct Rupture in the Left Triangular Ligament of the Liver after Endoscopic Sphincterotomy for Choledocholithiasis.胆总管结石内镜括约肌切开术后肝左三角韧带自发性胆管破裂所致胆汁性腹膜炎
Case Rep Gastroenterol. 2021 Jan 25;15(1):53-61. doi: 10.1159/000510932. eCollection 2021 Jan-Apr.
6
Recommendations for patient similarity classes: results of the AMIA 2019 workshop on defining patient similarity.患者相似性分类推荐:AMIA 2019 定义患者相似性专题研讨会的成果。
J Am Med Inform Assoc. 2020 Nov 1;27(11):1808-1812. doi: 10.1093/jamia/ocaa159.
7
MYH9-related disease: it does exist, may be more frequent than you think and requires specific therapy.MYH9相关疾病:它确实存在,可能比你想象的更常见,且需要特定治疗。
Clin Kidney J. 2019 Aug 1;12(4):488-493. doi: 10.1093/ckj/sfz103. eCollection 2019 Aug.
8
Hemorrhagic cholecystitis causing hemobilia and common bile duct obstruction.出血性胆囊炎导致胆血症和胆总管梗阻。
J Surg Case Rep. 2019 Apr 6;2019(3):rjz081. doi: 10.1093/jscr/rjz081. eCollection 2019 Mar.
9
Atypical anti-glomerular basement membrane disease.非典型抗肾小球基底膜病
Clin Kidney J. 2016 Apr;9(2):211-21. doi: 10.1093/ckj/sfv140. Epub 2015 Dec 30.
10
The Successful Treatment of Chronic Cholecystitis with SpyGlass Cholangioscopy-Assisted Gallbladder Drainage and Irrigation through Self-Expandable Metal Stents.经SpyGlass 胆道镜辅助胆囊引流和自膨式金属支架灌洗治疗慢性胆囊炎取得成功。
Gut Liver. 2012 Jan;6(1):136-8. doi: 10.5009/gnl.2012.6.1.136. Epub 2012 Jan 12.