Suppr超能文献

DR.BENCH:临床自然语言处理的诊断推理基准。

DR.BENCH: Diagnostic Reasoning Benchmark for Clinical Natural Language Processing.

机构信息

ICU Data Science Lab, Department of Medicine, University of Wisconsin Madison, 1685 Highland Ave, Madison, 53792, WI, USA.

Department of Computer Science, Loyola University Chicago, 1032 W Sheridan Rd, Chicago, 60660, IL, USA.

出版信息

J Biomed Inform. 2023 Feb;138:104286. doi: 10.1016/j.jbi.2023.104286. Epub 2023 Jan 25.

Abstract

The meaningful use of electronic health records (EHR) continues to progress in the digital era with clinical decision support systems augmented by artificial intelligence. A priority in improving provider experience is to overcome information overload and reduce the cognitive burden so fewer medical errors and cognitive biases are introduced during patient care. One major type of medical error is diagnostic error due to systematic or predictable errors in judgement that rely on heuristics. The potential for clinical natural language processing (cNLP) to model diagnostic reasoning in humans with forward reasoning from data to diagnosis and potentially reduce cognitive burden and medical error has not been investigated. Existing tasks to advance the science in cNLP have largely focused on information extraction and named entity recognition through classification tasks. We introduce a novel suite of tasks coined as Diagnostic Reasoning Benchmarks, Dr.Bench, as a new benchmark for developing and evaluating cNLP models with clinical diagnostic reasoning ability. The suite includes six tasks from ten publicly available datasets addressing clinical text understanding, medical knowledge reasoning, and diagnosis generation. DR.BENCH is the first clinical suite of tasks designed to be a natural language generation framework to evaluate pre-trained language models for diagnostic reasoning. The goal of DR. BENCH is to advance the science in cNLP to support downstream applications in computerized diagnostic decision support and improve the efficiency and accuracy of healthcare providers during patient care. We fine-tune and evaluate the state-of-the-art generative models on DR.BENCH. Experiments show that with domain adaptation pre-training on medical knowledge, the model demonstrated opportunities for improvement when evaluated in DR. BENCH. We share DR. BENCH as a publicly available GitLab repository with a systematic approach to load and evaluate models for the cNLP community. We also discuss the carbon footprint produced during the experiments and encourage future work on DR.BENCH to report the carbon footprint.

摘要

电子健康记录 (EHR) 在人工智能增强的临床决策支持系统的推动下,在数字时代继续取得有意义的进展。提高提供者体验的优先事项是克服信息过载和减少认知负担,从而减少医疗错误和认知偏差在患者护理过程中的引入。一种主要类型的医疗错误是由于依赖启发式的判断系统或可预测的错误而导致的诊断错误。临床自然语言处理 (cNLP) 从数据到诊断进行正向推理并潜在地减少认知负担和医疗错误的能力来模拟人类诊断推理的潜力尚未得到研究。现有的推进 cNLP 科学的任务主要集中在通过分类任务进行信息提取和命名实体识别。我们引入了一套新的任务,称为诊断推理基准 (Diagnostic Reasoning Benchmarks),Dr.Bench,作为开发和评估具有临床诊断推理能力的 cNLP 模型的新基准。该套件包括六个任务,来自十个公开可用的数据集,涉及临床文本理解、医学知识推理和诊断生成。DR.BENCH 是第一个临床任务套件,旨在成为评估用于诊断推理的预训练语言模型的自然语言生成框架。DR. BENCH 的目标是推进 cNLP 科学,以支持计算机化诊断决策支持的下游应用,并提高医疗保健提供者在患者护理期间的效率和准确性。我们在 DR.BENCH 上微调并评估了最先进的生成模型。实验表明,通过对医学知识的领域适应预训练,该模型在 DR. BENCH 中的评估中表现出了改进的机会。我们将 DR. BENCH 作为一个公共的 GitLab 存储库共享,其中包含一种系统的方法来加载和评估 cNLP 社区的模型。我们还讨论了实验过程中产生的碳足迹,并鼓励未来在 DR.BENCH 上的工作报告碳足迹。

相似文献

1
DR.BENCH: Diagnostic Reasoning Benchmark for Clinical Natural Language Processing.
J Biomed Inform. 2023 Feb;138:104286. doi: 10.1016/j.jbi.2023.104286. Epub 2023 Jan 25.
2
Multi-Task Training with In-Domain Language Models for Diagnostic Reasoning.
Proc Conf Assoc Comput Linguist Meet. 2023 Jul;2023(ClinicalNLP):78-85.
3
A scoping review of publicly available language tasks in clinical natural language processing.
J Am Med Inform Assoc. 2022 Sep 12;29(10):1797-1806. doi: 10.1093/jamia/ocac127.
4
Temporal reasoning over clinical text: the state of the art.
J Am Med Inform Assoc. 2013 Sep-Oct;20(5):814-9. doi: 10.1136/amiajnl-2013-001760. Epub 2013 May 15.
5
Tasks as needs: reframing the paradigm of clinical natural language processing research for real-world decision support.
J Am Med Inform Assoc. 2022 Sep 12;29(10):1810-1817. doi: 10.1093/jamia/ocac121.
6
Benchmarking for biomedical natural language processing tasks with a domain specific ALBERT.
BMC Bioinformatics. 2022 Apr 21;23(1):144. doi: 10.1186/s12859-022-04688-w.
7
Progress Note Understanding - Assessment and Plan Reasoning: Overview of the 2022 N2C2 Track 3 shared task.
J Biomed Inform. 2023 Jun;142:104346. doi: 10.1016/j.jbi.2023.104346. Epub 2023 Apr 13.
8
MedExpQA: Multilingual benchmarking of Large Language Models for Medical Question Answering.
Artif Intell Med. 2024 Sep;155:102938. doi: 10.1016/j.artmed.2024.102938. Epub 2024 Jul 31.
9
Benchmark datasets driving artificial intelligence development fail to capture the needs of medical professionals.
J Biomed Inform. 2023 Jan;137:104274. doi: 10.1016/j.jbi.2022.104274. Epub 2022 Dec 17.
10
Temporal reasoning with medical data--a review with emphasis on medical natural language processing.
J Biomed Inform. 2007 Apr;40(2):183-202. doi: 10.1016/j.jbi.2006.12.009. Epub 2007 Jan 11.

引用本文的文献

2
Quantitative Analysis of Diagnostic Reasoning Using Initial Electronic Medical Records.
Diagnostics (Basel). 2025 Jun 18;15(12):1561. doi: 10.3390/diagnostics15121561.
3
The DRAGON benchmark for clinical NLP.
NPJ Digit Med. 2025 May 17;8(1):289. doi: 10.1038/s41746-025-01626-x.
4
Multi-Task Training with In-Domain Language Models for Diagnostic Reasoning.
Proc Conf Assoc Comput Linguist Meet. 2023 Jul;2023(ClinicalNLP):78-85.
5
Predicting relations between SOAP note sections: The value of incorporating a clinical information model.
J Biomed Inform. 2023 May;141:104360. doi: 10.1016/j.jbi.2023.104360. Epub 2023 Apr 14.

本文引用的文献

3
A scoping review of publicly available language tasks in clinical natural language processing.
J Am Med Inform Assoc. 2022 Sep 12;29(10):1797-1806. doi: 10.1093/jamia/ocac127.
4
Tasks as needs: reframing the paradigm of clinical natural language processing research for real-world decision support.
J Am Med Inform Assoc. 2022 Sep 12;29(10):1810-1817. doi: 10.1093/jamia/ocac121.
8
Information overload and unsustainable workloads in the era of electronic health records.
Lancet Respir Med. 2020 Mar;8(3):243-244. doi: 10.1016/S2213-2600(20)30010-2. Epub 2020 Jan 13.
9
BioBERT: a pre-trained biomedical language representation model for biomedical text mining.
Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.
10
Challenges and Opportunities to Improve the Clinician Experience Reviewing Electronic Progress Notes.
Appl Clin Inform. 2019 May;10(3):446-453. doi: 10.1055/s-0039-1692164. Epub 2019 Jun 19.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验