Suppr超能文献

临床文本词嵌入研究

A survey of word embeddings for clinical text.

作者信息

Khattak Faiza Khan, Jeblee Serena, Pou-Prom Chloé, Abdalla Mohamed, Meaney Christopher, Rudzicz Frank

机构信息

Department of Computer Science, University of Toronto, Toronto, Ontario, Canada; Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada; Li Ka Shing Knowledge Institute, St Michael's Hospital, Toronto, Ontario, Canada.

Department of Computer Science, University of Toronto, Toronto, Ontario, Canada; Vector Institute for Artificial Intelligence, Toronto, Ontario, Canada.

出版信息

J Biomed Inform. 2019;100S:100057. doi: 10.1016/j.yjbinx.2019.100057. Epub 2019 Oct 28.

Abstract

Representing words as numerical vectors based on the contexts in which they appear has become the de facto method of analyzing text with machine learning. In this paper, we provide a guide for training these representations on clinical text data, using a survey of relevant research. Specifically, we discuss different types of word representations, clinical text corpora, available pre-trained clinical word vector embeddings, intrinsic and extrinsic evaluation, applications, and limitations of these approaches. This work can be used as a blueprint for clinicians and healthcare workers who may want to incorporate clinical text features in their own models and applications.

摘要

基于单词出现的上下文将其表示为数值向量已成为使用机器学习分析文本的实际方法。在本文中,我们通过对相关研究的综述,为在临床文本数据上训练这些表示提供了指南。具体而言,我们讨论了不同类型的单词表示、临床文本语料库、可用的预训练临床词向量嵌入、内在和外在评估、应用以及这些方法的局限性。这项工作可以作为临床医生和医护人员的蓝图,他们可能希望在自己的模型和应用中纳入临床文本特征。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验