• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于多种患者预后预测任务的通用医学概念嵌入和时间衰减

Generic medical concept embedding and time decay for diverse patient outcome prediction tasks.

作者信息

Li Yupeng, Dong Wei, Ru Boshu, Black Adam, Zhang Xinyuan, Guan Yuanfang

机构信息

Merck & Co., Inc., Rahway, NJ, USA.

Ann Arbor Algorithms Inc., Ann Arbor, MI 48104, USA.

出版信息

iScience. 2022 Aug 4;25(9):104880. doi: 10.1016/j.isci.2022.104880. eCollection 2022 Sep 16.

DOI:10.1016/j.isci.2022.104880
PMID:36039302
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9418804/
Abstract

Many fields, including Natural Language Processing (NLP), have recently witnessed the benefit of pre-training with large generic datasets to improve the accuracy of prediction tasks. However, there exist key differences between the longitudinal healthcare data (, claims) and NLP tasks, which make the direct application of NLP pre-training methods to healthcare data inappropriate. In this article, we developed a pre-training scheme for longitudinal healthcare data that leverages the pairing of medical history and a future event. We then conducted systematic evaluations of various methods on ten patient-level prediction tasks encompassing adverse events, misdiagnosis, disease risks, and readmission. In addition to substantially reducing model size, our results show that a universal medical concept embedding pretrained with generic big data as well as carefully designed time decay modeling improves the accuracy of different downstream prediction tasks.

摘要

包括自然语言处理(NLP)在内的许多领域,最近都见证了使用大型通用数据集进行预训练对提高预测任务准确性的益处。然而,纵向医疗保健数据(如索赔数据)与NLP任务之间存在关键差异,这使得直接将NLP预训练方法应用于医疗保健数据并不合适。在本文中,我们开发了一种针对纵向医疗保健数据的预训练方案,该方案利用病史与未来事件的配对。然后,我们对涵盖不良事件、误诊、疾病风险和再入院的十个患者级预测任务的各种方法进行了系统评估。除了大幅减小模型规模外,我们的结果表明,用通用大数据预训练的通用医学概念嵌入以及精心设计的时间衰减建模提高了不同下游预测任务的准确性。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5da7/9418804/bffcf75c7203/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5da7/9418804/e5592f49b500/fx1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5da7/9418804/1b4046a773ad/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5da7/9418804/9d8ee8b54d2a/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5da7/9418804/5c272f566b32/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5da7/9418804/bffcf75c7203/gr4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5da7/9418804/e5592f49b500/fx1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5da7/9418804/1b4046a773ad/gr1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5da7/9418804/9d8ee8b54d2a/gr2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5da7/9418804/5c272f566b32/gr3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/5da7/9418804/bffcf75c7203/gr4.jpg

相似文献

1
Generic medical concept embedding and time decay for diverse patient outcome prediction tasks.用于多种患者预后预测任务的通用医学概念嵌入和时间衰减
iScience. 2022 Aug 4;25(9):104880. doi: 10.1016/j.isci.2022.104880. eCollection 2022 Sep 16.
2
Ensembles of natural language processing systems for portable phenotyping solutions.用于便携表型解决方案的自然语言处理系统集合。
J Biomed Inform. 2019 Dec;100:103318. doi: 10.1016/j.jbi.2019.103318. Epub 2019 Oct 23.
3
When BERT meets Bilbo: a learning curve analysis of pretrained language model on disease classification.当 BERT 遇见比尔博:预训练语言模型在疾病分类上的学习曲线分析。
BMC Med Inform Decis Mak. 2022 Apr 5;21(Suppl 9):377. doi: 10.1186/s12911-022-01829-2.
4
A systematic review of natural language processing for classification tasks in the field of incident reporting and adverse event analysis.自然语言处理在事件报告和不良事件分析领域分类任务中的系统评价
Int J Med Inform. 2019 Dec;132:103971. doi: 10.1016/j.ijmedinf.2019.103971. Epub 2019 Oct 5.
5
The future of Cochrane Neonatal.考克兰新生儿协作网的未来。
Early Hum Dev. 2020 Nov;150:105191. doi: 10.1016/j.earlhumdev.2020.105191. Epub 2020 Sep 12.
6
Natural Language Processing and Its Implications for the Future of Medication Safety: A Narrative Review of Recent Advances and Challenges.自然语言处理及其对药物安全未来的影响:对近期进展和挑战的叙述性综述。
Pharmacotherapy. 2018 Aug;38(8):822-841. doi: 10.1002/phar.2151. Epub 2018 Jul 22.
7
Leveraging graph-based hierarchical medical entity embedding for healthcare applications.基于图的分层医学实体嵌入在医疗保健应用中的应用。
Sci Rep. 2021 Mar 12;11(1):5858. doi: 10.1038/s41598-021-85255-w.
8
A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.
9
Clinical concept extraction using transformers.使用转换器进行临床概念提取。
J Am Med Inform Assoc. 2020 Dec 9;27(12):1935-1942. doi: 10.1093/jamia/ocaa189.
10
Prediction task guided representation learning of medical codes in EHR.基于预测任务的电子健康记录中医疗编码的表示学习。
J Biomed Inform. 2018 Aug;84:1-10. doi: 10.1016/j.jbi.2018.06.013. Epub 2018 Jun 19.

引用本文的文献

1
Natural Language Processing and Coding for Detecting Bleeding Events in Discharge Summaries: Comparative Cross-Sectional Study.自然语言处理与出院小结中出血事件检测的编码:比较横断面研究
JMIR Med Inform. 2025 Aug 29;13:e67837. doi: 10.2196/67837.
2
A self-supervised framework for laboratory data imputation in electronic health records.一种用于电子健康记录中实验室数据插补的自监督框架。
Commun Med (Lond). 2025 Jul 1;5(1):251. doi: 10.1038/s43856-025-00973-w.
3
EntroLLM: Leveraging Entropy and Large Language Model Embeddings for Enhanced Risk Prediction with Wearable Device Data.

本文引用的文献

1
Development and external validation of prediction models for adverse health outcomes in rheumatoid arthritis: A multinational real-world cohort analysis.类风湿关节炎不良健康结局预测模型的建立与外部验证:一项多中心真实世界队列研究。
Semin Arthritis Rheum. 2022 Oct;56:152050. doi: 10.1016/j.semarthrit.2022.152050. Epub 2022 Jun 15.
2
Using Deep Learning to Identify High-Risk Patients with Heart Failure with Reduced Ejection Fraction.利用深度学习识别射血分数降低的心力衰竭高危患者。
J Health Econ Outcomes Res. 2021 Jul 29;8(2):6-13. doi: 10.36469/jheor.2021.25753. eCollection 2021.
3
Enhanced Potassium-Ion Storage of the 3D Carbon Superstructure by Manipulating the Nitrogen-Doped Species and Morphology.
EntroLLM:利用熵和大语言模型嵌入技术,借助可穿戴设备数据增强风险预测。
AMIA Jt Summits Transl Sci Proc. 2025 Jun 10;2025:225-234. eCollection 2025.
通过调控氮掺杂物种和形态增强三维碳超结构的钾离子存储性能
Nanomicro Lett. 2020 Oct 27;13(1):1. doi: 10.1007/s40820-020-00525-y.
4
Med-BERT: pretrained contextualized embeddings on large-scale structured electronic health records for disease prediction.医学BERT:基于大规模结构化电子健康记录进行疾病预测的预训练上下文嵌入模型
NPJ Digit Med. 2021 May 20;4(1):86. doi: 10.1038/s41746-021-00455-y.
5
Deep representation learning of patient data from Electronic Health Records (EHR): A systematic review.电子健康记录(EHR)中患者数据的深度表征学习:一项系统综述。
J Biomed Inform. 2021 Mar;115:103671. doi: 10.1016/j.jbi.2020.103671. Epub 2020 Dec 31.
6
Feasibility and evaluation of a large-scale external validation approach for patient-level prediction in an international data network: validation of models predicting stroke in female patients newly diagnosed with atrial fibrillation.一种在国际数据网络中进行大规模患者水平预测外部验证方法的可行性和评估:验证用于预测新诊断为心房颤动的女性患者中风的模型。
BMC Med Res Methodol. 2020 May 6;20(1):102. doi: 10.1186/s12874-020-00991-3.
7
Clinical Concept Embeddings Learned from Massive Sources of Multimodal Medical Data.从海量多模态医学数据中学习的临床概念嵌入。
Pac Symp Biocomput. 2020;25:295-306.
8
Design and implementation of a standardized framework to generate and evaluate patient-level prediction models using observational healthcare data.利用观察性医疗保健数据生成和评估患者水平预测模型的标准化框架的设计与实现。
J Am Med Inform Assoc. 2018 Aug 1;25(8):969-975. doi: 10.1093/jamia/ocy032.
9
Long short-term memory.长短期记忆
Neural Comput. 1997 Nov 15;9(8):1735-80. doi: 10.1162/neco.1997.9.8.1735.