• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

基于检索的诊断决策支持:混合方法研究。

Retrieval-Based Diagnostic Decision Support: Mixed Methods Study.

作者信息

Abdullahi Tassallah, Mercurio Laura, Singh Ritambhara, Eickhoff Carsten

机构信息

Department of Computer Science, Brown University, Providence, RI, United States.

Departments of Pediatrics & Emergency Medicine, Alpert Medical School, Brown University, Providence, RI, United States.

出版信息

JMIR Med Inform. 2024 Jun 19;12:e50209. doi: 10.2196/50209.

DOI:10.2196/50209
PMID:38896468
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11222760/
Abstract

BACKGROUND

Diagnostic errors pose significant health risks and contribute to patient mortality. With the growing accessibility of electronic health records, machine learning models offer a promising avenue for enhancing diagnosis quality. Current research has primarily focused on a limited set of diseases with ample training data, neglecting diagnostic scenarios with limited data availability.

OBJECTIVE

This study aims to develop an information retrieval (IR)-based framework that accommodates data sparsity to facilitate broader diagnostic decision support.

METHODS

We introduced an IR-based diagnostic decision support framework called CliniqIR. It uses clinical text records, the Unified Medical Language System Metathesaurus, and 33 million PubMed abstracts to classify a broad spectrum of diagnoses independent of training data availability. CliniqIR is designed to be compatible with any IR framework. Therefore, we implemented it using both dense and sparse retrieval approaches. We compared CliniqIR's performance to that of pretrained clinical transformer models such as Clinical Bidirectional Encoder Representations from Transformers (ClinicalBERT) in supervised and zero-shot settings. Subsequently, we combined the strength of supervised fine-tuned ClinicalBERT and CliniqIR to build an ensemble framework that delivers state-of-the-art diagnostic predictions.

RESULTS

On a complex diagnosis data set (DC3) without any training data, CliniqIR models returned the correct diagnosis within their top 3 predictions. On the Medical Information Mart for Intensive Care III data set, CliniqIR models surpassed ClinicalBERT in predicting diagnoses with <5 training samples by an average difference in mean reciprocal rank of 0.10. In a zero-shot setting where models received no disease-specific training, CliniqIR still outperformed the pretrained transformer models with a greater mean reciprocal rank of at least 0.10. Furthermore, in most conditions, our ensemble framework surpassed the performance of its individual components, demonstrating its enhanced ability to make precise diagnostic predictions.

CONCLUSIONS

Our experiments highlight the importance of IR in leveraging unstructured knowledge resources to identify infrequently encountered diagnoses. In addition, our ensemble framework benefits from combining the complementary strengths of the supervised and retrieval-based models to diagnose a broad spectrum of diseases.

摘要

背景

诊断错误会带来重大健康风险并导致患者死亡。随着电子健康记录的获取日益便捷,机器学习模型为提高诊断质量提供了一条有前景的途径。当前研究主要集中在有充足训练数据的有限疾病集上,而忽略了数据可用性有限的诊断场景。

目的

本研究旨在开发一种基于信息检索(IR)的框架,该框架能够适应数据稀疏性,以促进更广泛的诊断决策支持。

方法

我们引入了一个名为CliniqIR的基于IR的诊断决策支持框架。它使用临床文本记录、统一医学语言系统叙词表和3300万篇PubMed摘要,对广泛的诊断进行分类,而不依赖于训练数据的可用性。CliniqIR设计为与任何IR框架兼容。因此,我们使用密集和稀疏检索方法来实现它。我们在监督和零样本设置下,将CliniqIR的性能与预训练的临床变压器模型(如来自变压器的临床双向编码器表示(ClinicalBERT))的性能进行了比较。随后,我们结合了监督微调的ClinicalBERT和CliniqIR的优势,构建了一个能提供最先进诊断预测的集成框架。

结果

在一个没有任何训练数据的复杂诊断数据集(DC3)上,CliniqIR模型在前3个预测中返回了正确的诊断。在重症监护医学信息集市III数据集上,CliniqIR模型在预测训练样本少于5个的诊断时,平均倒数排名的平均差异为0.10,超过了ClinicalBERT。在模型没有接受特定疾病训练的零样本设置中,CliniqIR仍然优于预训练的变压器模型,平均倒数排名至少高0.10。此外,在大多数情况下,我们的集成框架超过了其各个组件的性能,证明了其进行精确诊断预测的增强能力。

结论

我们的实验强调了IR在利用非结构化知识资源识别罕见诊断方面的重要性。此外,我们的集成框架受益于结合基于监督和检索的模型的互补优势,以诊断广泛的疾病。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc04/11222760/46ba8093270e/medinform_v12i1e50209_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc04/11222760/4dc1829fe666/medinform_v12i1e50209_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc04/11222760/cb12a4d611f1/medinform_v12i1e50209_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc04/11222760/eb2871a233e2/medinform_v12i1e50209_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc04/11222760/699ceb35a093/medinform_v12i1e50209_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc04/11222760/e29ef153cda2/medinform_v12i1e50209_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc04/11222760/46ba8093270e/medinform_v12i1e50209_fig6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc04/11222760/4dc1829fe666/medinform_v12i1e50209_fig1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc04/11222760/cb12a4d611f1/medinform_v12i1e50209_fig2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc04/11222760/eb2871a233e2/medinform_v12i1e50209_fig3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc04/11222760/699ceb35a093/medinform_v12i1e50209_fig4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc04/11222760/e29ef153cda2/medinform_v12i1e50209_fig5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/dc04/11222760/46ba8093270e/medinform_v12i1e50209_fig6.jpg

相似文献

1
Retrieval-Based Diagnostic Decision Support: Mixed Methods Study.基于检索的诊断决策支持:混合方法研究。
JMIR Med Inform. 2024 Jun 19;12:e50209. doi: 10.2196/50209.
2
A large language model-based generative natural language processing framework fine-tuned on clinical notes accurately extracts headache frequency from electronic health records.基于大型语言模型的生成式自然语言处理框架,在临床笔记上进行了微调,能够从电子健康记录中准确提取头痛频率。
Headache. 2024 Apr;64(4):400-409. doi: 10.1111/head.14702. Epub 2024 Mar 25.
3
Identification of Semantically Similar Sentences in Clinical Notes: Iterative Intermediate Training Using Multi-Task Learning.临床笔记中语义相似句子的识别:使用多任务学习的迭代中间训练
JMIR Med Inform. 2020 Nov 27;8(11):e22508. doi: 10.2196/22508.
4
Disease Concept-Embedding Based on the Self-Supervised Method for Medical Information Extraction from Electronic Health Records and Disease Retrieval: Algorithm Development and Validation Study.基于自监督方法的疾病概念嵌入在电子健康记录中的医学信息提取和疾病检索:算法开发和验证研究。
J Med Internet Res. 2021 Jan 27;23(1):e25113. doi: 10.2196/25113.
5
A Large Language Model-Based Generative Natural Language Processing Framework Finetuned on Clinical Notes Accurately Extracts Headache Frequency from Electronic Health Records.一种基于大语言模型的生成式自然语言处理框架,在临床笔记上进行微调后,能准确从电子健康记录中提取头痛频率。
medRxiv. 2023 Oct 3:2023.10.02.23296403. doi: 10.1101/2023.10.02.23296403.
6
Evaluating the accuracy of a state-of-the-art large language model for prediction of admissions from the emergency room.评估最先进的大型语言模型在预测急诊入院方面的准确性。
J Am Med Inform Assoc. 2024 Sep 1;31(9):1921-1928. doi: 10.1093/jamia/ocae103.
7
Few-Shot Learning for Clinical Natural Language Processing Using Siamese Neural Networks: Algorithm Development and Validation Study.使用暹罗神经网络的临床自然语言处理少样本学习:算法开发与验证研究
JMIR AI. 2023 May 4;2:e44293. doi: 10.2196/44293.
8
A Natural Language Processing Model for COVID-19 Detection Based on Dutch General Practice Electronic Health Records by Using Bidirectional Encoder Representations From Transformers: Development and Validation Study.基于荷兰全科电子健康记录的 COVID-19 检测自然语言处理模型:使用转换器的双向编码器表示进行开发和验证研究。
J Med Internet Res. 2023 Oct 4;25:e49944. doi: 10.2196/49944.
9
Enhancing Clinical Relevance of Pretrained Language Models Through Integration of External Knowledge: Case Study on Cardiovascular Diagnosis From Electronic Health Records.通过整合外部知识提高预训练语言模型的临床相关性:来自电子健康记录的心血管诊断案例研究
JMIR AI. 2024 Aug 6;3:e56932. doi: 10.2196/56932.
10
Extraction of Substance Use Information From Clinical Notes: Generative Pretrained Transformer-Based Investigation.从临床记录中提取物质使用信息:基于生成式预训练变换器的研究
JMIR Med Inform. 2024 Aug 19;12:e56243. doi: 10.2196/56243.

引用本文的文献

1
Retrieval augmented generation for large language models in healthcare: A systematic review.医疗保健领域大语言模型的检索增强生成:一项系统综述。
PLOS Digit Health. 2025 Jun 11;4(6):e0000877. doi: 10.1371/journal.pdig.0000877. eCollection 2025 Jun.
2
Integrating retrieval-augmented generation for enhanced personalized physician recommendations in web-based medical services: model development study.整合检索增强生成技术以在基于网络的医疗服务中提供更个性化的医生推荐:模型开发研究
Front Public Health. 2025 Jan 29;13:1501408. doi: 10.3389/fpubh.2025.1501408. eCollection 2025.
3
Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.

本文引用的文献

1
Learning to Make Rare and Complex Diagnoses With Generative AI Assistance: Qualitative Study of Popular Large Language Models.利用生成式人工智能辅助学习罕见且复杂的诊断:对流行的大型语言模型的定性研究。
JMIR Med Educ. 2024 Feb 13;10:e51391. doi: 10.2196/51391.
2
A large-scale dataset of patient summaries for retrieval-based clinical decision support systems.基于检索的临床决策支持系统的大型患者摘要数据集。
Sci Data. 2023 Dec 18;10(1):909. doi: 10.1038/s41597-023-02814-8.
3
MedCPT: Contrastive Pre-trained Transformers with large-scale PubMed search logs for zero-shot biomedical information retrieval.
利用生成式人工智能辅助学习罕见且复杂的诊断:对流行的大型语言模型的定性研究。
JMIR Med Educ. 2024 Feb 13;10:e51391. doi: 10.2196/51391.
MedCPT:利用大规模 PubMed 检索日志进行零样本生物医学信息检索的对比预训练 Transformer。
Bioinformatics. 2023 Nov 1;39(11). doi: 10.1093/bioinformatics/btad651.
4
Classifying the lifestyle status for Alzheimer's disease from clinical notes using deep learning with weak supervision.使用基于弱监督的深度学习对临床笔记进行阿尔茨海默病生活方式状况分类。
BMC Med Inform Decis Mak. 2022 Jul 7;22(Suppl 1):88. doi: 10.1186/s12911-022-01819-4.
5
Predicting diarrhoea outbreaks with climate change.预测气候变化引发的腹泻疫情
PLoS One. 2022 Apr 19;17(4):e0262008. doi: 10.1371/journal.pone.0262008. eCollection 2022.
6
CODER: Knowledge-infused cross-lingual medical term embedding for term normalization.知识注入的跨语言医学术语嵌入用于术语归一化。
J Biomed Inform. 2022 Feb;126:103983. doi: 10.1016/j.jbi.2021.103983. Epub 2022 Jan 4.
7
Zero-Shot Medical Image Retrieval for Emerging Infectious Diseases Based on Meta-Transfer Learning - Worldwide, 2020.基于元迁移学习的新发传染病零样本医学图像检索 - 全球,2020年
China CDC Wkly. 2020 Dec 25;2(52):1004-1008. doi: 10.46234/ccdcw2020.268.
8
Comparison of Diagnostic Recommendations from Individual Physicians versus the Collective Intelligence of Multiple Physicians in Ambulatory Cases Referred for Specialist Consultation.在门诊转介给专家会诊的病例中,比较单个医生的诊断建议与多位医生的集体智慧。
Med Decis Making. 2022 Apr;42(3):293-302. doi: 10.1177/0272989X211031209. Epub 2021 Aug 11.
9
Semi-supervised few-shot learning approach for plant diseases recognition.用于植物病害识别的半监督少样本学习方法。
Plant Methods. 2021 Jun 27;17(1):68. doi: 10.1186/s13007-021-00770-1.
10
Multimodal deep learning models for early detection of Alzheimer's disease stage.多模态深度学习模型在阿尔茨海默病早期阶段的检测。
Sci Rep. 2021 Feb 5;11(1):3254. doi: 10.1038/s41598-020-74399-w.