Suppr超能文献

利用放射科报告中的上下文模式扩展放射学词汇。

Expanding a radiology lexicon using contextual patterns in radiology reports.

机构信息

Department of Genetics and Genomic Sciences, Icahn School of Medicine at Mount Sinai, New York, NY, USA.

Biomedical Informatics Training Program, Stanford University, Stanford, CA, USA.

出版信息

J Am Med Inform Assoc. 2018 Jun 1;25(6):679-685. doi: 10.1093/jamia/ocx152.

Abstract

OBJECTIVE

Distributional semantics algorithms, which learn vector space representations of words and phrases from large corpora, identify related terms based on contextual usage patterns. We hypothesize that distributional semantics can speed up lexicon expansion in a clinical domain, radiology, by unearthing synonyms from the corpus.

MATERIALS AND METHODS

We apply word2vec, a distributional semantics software package, to the text of radiology notes to identify synonyms for RadLex, a structured lexicon of radiology terms. We stratify performance by term category, term frequency, number of tokens in the term, vector magnitude, and the context window used in vector building.

RESULTS

Ranking candidates based on distributional similarity to a target term results in high curation efficiency: on a ranked list of 775 249 terms, >50% of synonyms occurred within the first 25 terms. Synonyms are easier to find if the target term is a phrase rather than a single word, if it occurs at least 100× in the corpus, and if its vector magnitude is between 4 and 5. Some RadLex categories, such as anatomical substances, are easier to identify synonyms for than others.

DISCUSSION

The unstructured text of clinical notes contains a wealth of information about human diseases and treatment patterns. However, searching and retrieving information from clinical notes often suffer due to variations in how similar concepts are described in the text. Biomedical lexicons address this challenge, but are expensive to produce and maintain. Distributional semantics algorithms can assist lexicon curation, saving researchers time and money.

摘要

目的

分布语义算法通过从大型语料库中学习单词和短语的向量空间表示,根据上下文使用模式识别相关术语。我们假设分布语义可以通过从语料库中挖掘同义词来加速临床领域(放射学)的词汇扩展。

材料和方法

我们将 word2vec(一种分布语义软件包)应用于放射学笔记的文本中,以识别 RadLex(放射学术语的结构化词汇)的同义词。我们根据术语类别、术语频率、术语中的标记数量、向量幅度以及用于构建向量的上下文窗口对性能进行分层。

结果

根据与目标术语的分布相似性对候选术语进行排序会产生很高的编校效率:在 775249 个术语的排名列表中,超过 50%的同义词出现在前 25 个术语中。如果目标术语是短语而不是单个单词,如果它在语料库中至少出现 100 次,并且其向量幅度在 4 到 5 之间,则更容易找到同义词。一些 RadLex 类别,如解剖物质,比其他类别更容易识别同义词。

讨论

临床笔记的非结构化文本包含有关人类疾病和治疗模式的大量信息。然而,由于文本中描述相似概念的方式存在差异,因此从临床笔记中搜索和检索信息往往会遇到困难。生物医学词汇表解决了这一挑战,但制作和维护成本很高。分布语义算法可以辅助词汇编校,为研究人员节省时间和金钱。

相似文献

7
Identifying synonymy between relational phrases using word embeddings.使用词嵌入识别关系短语之间的同义关系。
J Biomed Inform. 2015 Aug;56:94-102. doi: 10.1016/j.jbi.2015.05.010. Epub 2015 May 22.
8
Corpus domain effects on distributional semantic modeling of medical terms.语料库领域对医学术语分布语义建模的影响。
Bioinformatics. 2016 Dec 1;32(23):3635-3644. doi: 10.1093/bioinformatics/btw529. Epub 2016 Aug 16.
10
Enhancing clinical concept extraction with distributional semantics.利用分布语义增强临床概念提取。
J Biomed Inform. 2012 Feb;45(1):129-40. doi: 10.1016/j.jbi.2011.10.007. Epub 2011 Nov 7.

引用本文的文献

4
Biomedical Ontologies to Guide AI Development in Radiology.生物医学本体在放射学中的人工智能开发中的指导作用。
J Digit Imaging. 2021 Dec;34(6):1331-1341. doi: 10.1007/s10278-021-00527-1. Epub 2021 Nov 1.

本文引用的文献

1
Information extraction from multi-institutional radiology reports.从多机构放射学报告中提取信息。
Artif Intell Med. 2016 Jan;66:29-39. doi: 10.1016/j.artmed.2015.09.007. Epub 2015 Oct 3.
2
The digital revolution in phenotyping.表型分析中的数字革命。
Brief Bioinform. 2016 Sep;17(5):819-30. doi: 10.1093/bib/bbv083. Epub 2015 Sep 29.
3
Incorporating domain knowledge in chemical and biomedical named entity recognition with word representations.利用词向量将领域知识融入化学和生物医学命名实体识别。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S9. doi: 10.1186/1758-2946-7-S1-S9. eCollection 2015.
4
CHEMDNER: The drugs and chemical names extraction challenge.CHEMDNER:药物和化学名称提取挑战赛。
J Cheminform. 2015 Jan 19;7(Suppl 1 Text mining for chemistry and the CHEMDNER track):S1. doi: 10.1186/1758-2946-7-S1-S1. eCollection 2015.
8
ChemSpot: a hybrid system for chemical named entity recognition.ChemSpot:一种用于化学命名实体识别的混合系统。
Bioinformatics. 2012 Jun 15;28(12):1633-40. doi: 10.1093/bioinformatics/bts183. Epub 2012 Apr 12.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验