• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

临床文本词嵌入研究

A survey of word embeddings for clinical text.

作者信息

Khattak Faiza Khan, Jeblee Serena, Pou-Prom Chloé, Abdalla Mohamed, Meaney Christopher, Rudzicz Frank

机构信息

Department of Computer Science, University of Toronto, Toronto, Ontario, Canada; Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada; Li Ka Shing Knowledge Institute, St Michael's Hospital, Toronto, Ontario, Canada.

Department of Computer Science, University of Toronto, Toronto, Ontario, Canada; Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada.

出版信息

J Biomed Inform. 2019;100S:100057. doi: 10.1016/j.yjbinx.2019.100057. Epub 2019 Oct 28.

DOI:10.1016/j.yjbinx.2019.100057
PMID:34384583
Abstract

Representing words as numerical vectors based on the contexts in which they appear has become the de facto method of analyzing text with machine learning. In this paper, we provide a guide for training these representations on clinical text data, using a survey of relevant research. Specifically, we discuss different types of word representations, clinical text corpora, available pre-trained clinical word vector embeddings, intrinsic and extrinsic evaluation, applications, and limitations of these approaches. This work can be used as a blueprint for clinicians and healthcare workers who may want to incorporate clinical text features in their own models and applications.

摘要

基于单词出现的上下文将其表示为数值向量已成为使用机器学习分析文本的实际方法。在本文中,我们通过对相关研究的综述,为在临床文本数据上训练这些表示提供了指南。具体而言,我们讨论了不同类型的单词表示、临床文本语料库、可用的预训练临床词向量嵌入、内在和外在评估、应用以及这些方法的局限性。这项工作可以作为临床医生和医护人员的蓝图,他们可能希望在自己的模型和应用中纳入临床文本特征。

相似文献

1
A survey of word embeddings for clinical text.临床文本词嵌入研究
J Biomed Inform. 2019;100S:100057. doi: 10.1016/j.yjbinx.2019.100057. Epub 2019 Oct 28.
2
A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。
J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.
3
SECNLP: A survey of embeddings in clinical natural language processing.SECNLP:临床自然语言处理中的嵌入技术综述。
J Biomed Inform. 2020 Jan;101:103323. doi: 10.1016/j.jbi.2019.103323. Epub 2019 Nov 8.
4
Unsupervised low-dimensional vector representations for words, phrases and text that are transparent, scalable, and produce similarity metrics that are not redundant with neural embeddings.用于单词、短语和文本的无监督低维向量表示,具有透明性、可扩展性,并能产生与神经嵌入不冗余的相似性度量。
J Biomed Inform. 2019 Feb;90:103096. doi: 10.1016/j.jbi.2019.103096. Epub 2019 Jan 14.
5
Language with vision: A study on grounded word and sentence embeddings.带视觉的语言:基于词汇和句子嵌入的研究。
Behav Res Methods. 2024 Sep;56(6):5622-5646. doi: 10.3758/s13428-023-02294-z. Epub 2023 Dec 19.
6
Training and intrinsic evaluation of lightweight word embeddings for the clinical domain in Spanish.西班牙语临床领域轻量级词嵌入的训练与内在评估
Front Artif Intell. 2022 Sep 21;5:970517. doi: 10.3389/frai.2022.970517. eCollection 2022.
7
Investigating the impact of pre-processing techniques and pre-trained word embeddings in detecting Arabic health information on social media.研究预处理技术和预训练词嵌入在社交媒体上检测阿拉伯语健康信息方面的影响。
J Big Data. 2021;8(1):95. doi: 10.1186/s40537-021-00488-w. Epub 2021 Jul 2.
8
Jointly learning word embeddings using a corpus and a knowledge base.联合使用语料库和知识库学习词向量。
PLoS One. 2018 Mar 12;13(3):e0193094. doi: 10.1371/journal.pone.0193094. eCollection 2018.
9
Combine Factual Medical Knowledge and Distributed Word Representation to Improve Clinical Named Entity Recognition.结合事实医学知识与分布式词表示以改进临床命名实体识别。
AMIA Annu Symp Proc. 2018 Dec 5;2018:1110-1117. eCollection 2018.
10
Summarization of biomedical articles using domain-specific word embeddings and graph ranking.基于领域特定词嵌入和图排序的生物医学文章摘要。
J Biomed Inform. 2020 Jul;107:103452. doi: 10.1016/j.jbi.2020.103452. Epub 2020 May 19.

引用本文的文献

1
Evaluating Hospital Course Summarization by an Electronic Health Record-Based Large Language Model.基于电子健康记录的大语言模型评估医院病程总结
JAMA Netw Open. 2025 Aug 1;8(8):e2526339. doi: 10.1001/jamanetworkopen.2025.26339.
2
Benchmarking Transformer Embedding Models for Biomedical Terminology Standardization.用于生物医学术语标准化的基准测试变压器嵌入模型
Mach Learn Appl. 2025 Sep;21. doi: 10.1016/j.mlwa.2025.100683. Epub 2025 Jun 5.
3
GenAI exceeds clinical experts in predicting acute kidney injury following paediatric cardiopulmonary bypass.
在预测小儿体外循环术后急性肾损伤方面,生成式人工智能优于临床专家。
Sci Rep. 2025 Jul 1;15(1):20847. doi: 10.1038/s41598-025-04651-8.
4
Deciphering Abbreviations in Malaysian Clinical Notes Using Machine Learning.使用机器学习解读马来西亚临床记录中的缩写
Methods Inf Med. 2024 Dec;63(5-06):195-202. doi: 10.1055/a-2521-4372. Epub 2025 Jan 22.
5
Distilling the knowledge from large-language model for health event prediction.从大语言模型中提取知识用于健康事件预测。
Sci Rep. 2024 Dec 28;14(1):30675. doi: 10.1038/s41598-024-75331-2.
6
Emotion topology: extracting fundamental components of emotions from text using word embeddings.情感拓扑:使用词嵌入从文本中提取情感的基本组成部分。
Front Psychol. 2024 Oct 8;15:1401084. doi: 10.3389/fpsyg.2024.1401084. eCollection 2024.
7
Building RadiologyNET: an unsupervised approach to annotating a large-scale multimodal medical database.构建放射学网络:一种用于注释大规模多模态医学数据库的无监督方法。
BioData Min. 2024 Jul 12;17(1):22. doi: 10.1186/s13040-024-00373-1.
8
Clinical Information Retrieval: A Literature Review.临床信息检索:文献综述
J Healthc Inform Res. 2024 Jan 23;8(2):313-352. doi: 10.1007/s41666-024-00159-4. eCollection 2024 Jun.
9
Contextualizing injury severity from occupational accident reports using an optimized deep learning prediction model.使用优化的深度学习预测模型,根据职业事故报告来确定损伤严重程度。
PeerJ Comput Sci. 2024 Apr 17;10:e1985. doi: 10.7717/peerj-cs.1985. eCollection 2024.
10
An explainable long short-term memory network for surgical site infection identification.用于手术部位感染识别的可解释长短时记忆网络。
Surgery. 2024 Jul;176(1):24-31. doi: 10.1016/j.surg.2024.03.006. Epub 2024 Apr 18.