• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
LCD benchmark: long clinical document benchmark on mortality prediction for language models.LCD基准:用于语言模型死亡率预测的长临床文档基准。
J Am Med Inform Assoc. 2025 Feb 1;32(2):285-295. doi: 10.1093/jamia/ocae287.
2
LCD Benchmark: Long Clinical Document Benchmark on Mortality Prediction for Language Models.LCD基准:语言模型死亡率预测的长临床文档基准。
medRxiv. 2024 Jul 2:2024.03.26.24304920. doi: 10.1101/2024.03.26.24304920.
3
A dataset and benchmark for hospital course summarization with adapted large language models.一个用于医院病程总结的数据集和基准测试,采用了适配的大语言模型。
J Am Med Inform Assoc. 2025 Mar 1;32(3):470-479. doi: 10.1093/jamia/ocae312.
4
The potential of Generative Pre-trained Transformer 4 (GPT-4) to analyse medical notes in three different languages: a retrospective model-evaluation study.生成式预训练变换器4(GPT-4)分析三种不同语言医学笔记的潜力:一项回顾性模型评估研究。
Lancet Digit Health. 2025 Jan;7(1):e35-e43. doi: 10.1016/S2589-7500(24)00246-2.
5
Extraction of sleep information from clinical notes of Alzheimer's disease patients using natural language processing.使用自然语言处理从阿尔茨海默病患者的临床记录中提取睡眠信息。
J Am Med Inform Assoc. 2024 Oct 1;31(10):2217-2227. doi: 10.1093/jamia/ocae177.
6
Effectiveness and cost-effectiveness of computer and other electronic aids for smoking cessation: a systematic review and network meta-analysis.计算机和其他电子戒烟辅助手段的有效性和成本效益:系统评价和网络荟萃分析。
Health Technol Assess. 2012;16(38):1-205, iii-v. doi: 10.3310/hta16380.
7
Signs and symptoms to determine if a patient presenting in primary care or hospital outpatient settings has COVID-19.在基层医疗机构或医院门诊环境中,如果患者出现以下症状和体征,可判断其是否患有 COVID-19。
Cochrane Database Syst Rev. 2022 May 20;5(5):CD013665. doi: 10.1002/14651858.CD013665.pub3.
8
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.慢性斑块状银屑病的全身药理学治疗:一项网状Meta分析。
Cochrane Database Syst Rev. 2020 Jan 9;1(1):CD011535. doi: 10.1002/14651858.CD011535.pub3.
9
Systemic pharmacological treatments for chronic plaque psoriasis: a network meta-analysis.系统性药理学治疗慢性斑块状银屑病:网络荟萃分析。
Cochrane Database Syst Rev. 2021 Apr 19;4(4):CD011535. doi: 10.1002/14651858.CD011535.pub4.
10
Falls prevention interventions for community-dwelling older adults: systematic review and meta-analysis of benefits, harms, and patient values and preferences.社区居住的老年人跌倒预防干预措施:系统评价和荟萃分析的益处、危害以及患者的价值观和偏好。
Syst Rev. 2024 Nov 26;13(1):289. doi: 10.1186/s13643-024-02681-3.

引用本文的文献

1
Applications of generative artificial intelligence in outcome prediction in intensive care medicine-a scoping review.生成式人工智能在重症医学结局预测中的应用——一项范围综述
Front Digit Health. 2025 Aug 5;7:1633458. doi: 10.3389/fdgth.2025.1633458. eCollection 2025.
2
The TRIPOD-LLM reporting guideline for studies using large language models: a Korean translation.使用大语言模型的研究的TRIPOD-LLM报告指南:韩文译本
Ewha Med J. 2025 Jul;48(3):e49. doi: 10.12771/emj.2025.00661. Epub 2025 Jul 31.
3
The TRIPOD-LLM reporting guideline for studies using large language models.使用大语言模型的研究的TRIPOD-LLM报告指南。
Nat Med. 2025 Jan;31(1):60-69. doi: 10.1038/s41591-024-03425-5. Epub 2025 Jan 8.
4
Probabilistic medical predictions of large language models.大语言模型的概率医学预测
NPJ Digit Med. 2024 Dec 19;7(1):367. doi: 10.1038/s41746-024-01366-4.

本文引用的文献

1
Study of Patient and Physician Attitudes Toward Automated Prognostic Models for Patients With Metastatic Cancer.患者和医生对转移性癌症患者自动化预后模型态度的研究。
JCO Clin Cancer Inform. 2023 Jul;7:e2300023. doi: 10.1200/CCI.23.00023.
2
Author Correction: MIMIC-IV, a freely accessible electronic health record dataset.作者更正:MIMIC-IV,一个可免费获取的电子健康记录数据集。
Sci Data. 2023 Apr 18;10(1):219. doi: 10.1038/s41597-023-02136-9.
3
A large language model for electronic health records.用于电子健康记录的大型语言模型。
NPJ Digit Med. 2022 Dec 26;5(1):194. doi: 10.1038/s41746-022-00742-2.
4
A comparative study of pretrained language models for long clinical text.基于预训练语言模型的长临床文本比较研究
J Am Med Inform Assoc. 2023 Jan 18;30(2):340-347. doi: 10.1093/jamia/ocac225.
5
Codabench: Flexible, easy-to-use, and reproducible meta-benchmark platform.Codabench:灵活、易用且可重现的元基准测试平台。
Patterns (N Y). 2022 Jun 24;3(7):100543. doi: 10.1016/j.patter.2022.100543. eCollection 2022 Jul 8.
6
The language of crisis: spatiotemporal effects of COVID-19 pandemic dynamics on health crisis communications by political leaders.危机的语言:新冠疫情动态对政治领导人健康危机沟通的时空影响
NPJ Digit Med. 2022 Jan 10;5(1):1. doi: 10.1038/s41746-021-00554-w.
7
Deep representation learning of patient data from Electronic Health Records (EHR): A systematic review.电子健康记录(EHR)中患者数据的深度表征学习:一项系统综述。
J Biomed Inform. 2021 Mar;115:103671. doi: 10.1016/j.jbi.2020.103671. Epub 2020 Dec 31.
8
Deep learning in clinical natural language processing: a methodical review.深度学习在临床自然语言处理中的应用:系统综述。
J Am Med Inform Assoc. 2020 Mar 1;27(3):457-470. doi: 10.1093/jamia/ocz200.
9
Why many oncologists fail to share accurate prognoses: They care deeply for their patients.为何许多肿瘤学家未能给出准确的预后信息:他们深切关心自己的患者。
Cancer. 2020 Mar 15;126(6):1163-1165. doi: 10.1002/cncr.32635. Epub 2019 Nov 27.
10
Association of Early Palliative Care Use With Survival and Place of Death Among Patients With Advanced Lung Cancer Receiving Care in the Veterans Health Administration.早期姑息治疗的使用与退伍军人事务部接受治疗的晚期肺癌患者的生存和死亡地点的关系。
JAMA Oncol. 2019 Dec 1;5(12):1702-1709. doi: 10.1001/jamaoncol.2019.3105.

LCD基准:用于语言模型死亡率预测的长临床文档基准。

LCD benchmark: long clinical document benchmark on mortality prediction for language models.

作者信息

Yoon WonJin, Chen Shan, Gao Yanjun, Zhao Zhanzhan, Dligach Dmitriy, Bitterman Danielle S, Afshar Majid, Miller Timothy

机构信息

Computational Health Informatics Program, Boston Children's Hospital, Boston, MA 02215, United States.

Department of Pediatrics, Harvard Medical School, Boston, MA 02115, United States.

出版信息

J Am Med Inform Assoc. 2025 Feb 1;32(2):285-295. doi: 10.1093/jamia/ocae287.

DOI:10.1093/jamia/ocae287
PMID:39602813
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC11756648/
Abstract

OBJECTIVES

The application of natural language processing (NLP) in the clinical domain is important due to the rich unstructured information in clinical documents, which often remains inaccessible in structured data. When applying NLP methods to a certain domain, the role of benchmark datasets is crucial as benchmark datasets not only guide the selection of best-performing models but also enable the assessment of the reliability of the generated outputs. Despite the recent availability of language models capable of longer context, benchmark datasets targeting long clinical document classification tasks are absent.

MATERIALS AND METHODS

To address this issue, we propose Long Clinical Document (LCD) benchmark, a benchmark for the task of predicting 30-day out-of-hospital mortality using discharge notes of Medical Information Mart for Intensive Care IV and statewide death data. We evaluated this benchmark dataset using baseline models, from bag-of-words and convolutional neural network to instruction-tuned large language models. Additionally, we provide a comprehensive analysis of the model outputs, including manual review and visualization of model weights, to offer insights into their predictive capabilities and limitations.

RESULTS

Baseline models showed 28.9% for best-performing supervised models and 32.2% for GPT-4 in F1 metrics. Notes in our dataset have a median word count of 1687.

DISCUSSION

Our analysis of the model outputs showed that our dataset is challenging for both models and human experts, but the models can find meaningful signals from the text.

CONCLUSION

We expect our LCD benchmark to be a resource for the development of advanced supervised models, or prompting methods, tailored for clinical text.

摘要

目标

自然语言处理(NLP)在临床领域的应用至关重要,因为临床文档中存在丰富的非结构化信息,而这些信息在结构化数据中往往难以获取。在将NLP方法应用于特定领域时,基准数据集的作用至关重要,因为基准数据集不仅能指导最佳性能模型的选择,还能评估生成输出的可靠性。尽管最近出现了能够处理更长上下文的语言模型,但针对长临床文档分类任务的基准数据集却不存在。

材料与方法

为了解决这个问题,我们提出了长临床文档(LCD)基准,这是一个使用重症监护医学信息集市IV的出院小结和全州死亡数据来预测30天院外死亡率任务的基准。我们使用从词袋模型和卷积神经网络到指令微调的大语言模型等基线模型对这个基准数据集进行了评估。此外,我们对模型输出进行了全面分析,包括人工审查和模型权重可视化,以深入了解它们的预测能力和局限性。

结果

在F1指标方面,最佳性能的监督模型的基线模型显示为28.9%,GPT-4为32.2%。我们数据集中的笔记中位数字数为1687。

讨论

我们对模型输出的分析表明,我们的数据集对模型和人类专家来说都具有挑战性,但模型可以从文本中找到有意义的信号。

结论

我们期望我们的LCD基准能成为开发针对临床文本的先进监督模型或提示方法的资源。