DR.BENCH：临床自然语言处理的诊断推理基准。

The meaningful use of electronic health records (EHR) continues to progress in the digital era with clinical decision support systems augmented by artificial intelligence. A priority in improving provider experience is to overcome information overload and reduce the cognitive burden so fewer medical errors and cognitive biases are introduced during patient care. One major type of medical error is diagnostic error due to systematic or predictable errors in judgement that rely on heuristics. The potential for clinical natural language processing (cNLP) to model diagnostic reasoning in humans with forward reasoning from data to diagnosis and potentially reduce cognitive burden and medical error has not been investigated. Existing tasks to advance the science in cNLP have largely focused on information extraction and named entity recognition through classification tasks. We introduce a novel suite of tasks coined as Diagnostic Reasoning Benchmarks, Dr.Bench, as a new benchmark for developing and evaluating cNLP models with clinical diagnostic reasoning ability. The suite includes six tasks from ten publicly available datasets addressing clinical text understanding, medical knowledge reasoning, and diagnosis generation. DR.BENCH is the first clinical suite of tasks designed to be a natural language generation framework to evaluate pre-trained language models for diagnostic reasoning. The goal of DR. BENCH is to advance the science in cNLP to support downstream applications in computerized diagnostic decision support and improve the efficiency and accuracy of healthcare providers during patient care. We fine-tune and evaluate the state-of-the-art generative models on DR.BENCH. Experiments show that with domain adaptation pre-training on medical knowledge, the model demonstrated opportunities for improvement when evaluated in DR. BENCH. We share DR. BENCH as a publicly available GitLab repository with a systematic approach to load and evaluate models for the cNLP community. We also discuss the carbon footprint produced during the experiments and encourage future work on DR.BENCH to report the carbon footprint.

电子健康记录 (EHR) 在人工智能增强的临床决策支持系统的推动下，在数字时代继续取得有意义的进展。提高提供者体验的优先事项是克服信息过载和减少认知负担，从而减少医疗错误和认知偏差在患者护理过程中的引入。一种主要类型的医疗错误是由于依赖启发式的判断系统或可预测的错误而导致的诊断错误。临床自然语言处理 (cNLP) 从数据到诊断进行正向推理并潜在地减少认知负担和医疗错误的能力来模拟人类诊断推理的潜力尚未得到研究。现有的推进 cNLP 科学的任务主要集中在通过分类任务进行信息提取和命名实体识别。我们引入了一套新的任务，称为诊断推理基准 (Diagnostic Reasoning Benchmarks)，Dr.Bench，作为开发和评估具有临床诊断推理能力的 cNLP 模型的新基准。该套件包括六个任务，来自十个公开可用的数据集，涉及临床文本理解、医学知识推理和诊断生成。DR.BENCH 是第一个临床任务套件，旨在成为评估用于诊断推理的预训练语言模型的自然语言生成框架。DR. BENCH 的目标是推进 cNLP 科学，以支持计算机化诊断决策支持的下游应用，并提高医疗保健提供者在患者护理期间的效率和准确性。我们在 DR.BENCH 上微调并评估了最先进的生成模型。实验表明，通过对医学知识的领域适应预训练，该模型在 DR. BENCH 中的评估中表现出了改进的机会。我们将 DR. BENCH 作为一个公共的 GitLab 存储库共享，其中包含一种系统的方法来加载和评估 cNLP 社区的模型。我们还讨论了实验过程中产生的碳足迹，并鼓励未来在 DR.BENCH 上的工作报告碳足迹。

DR.BENCH：临床自然语言处理的诊断推理基准。

DR.BENCH: Diagnostic Reasoning Benchmark for Clinical Natural Language Processing.

机构信息

出版信息

相似文献

引用本文的文献

本文引用的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献