Suppr超能文献

临床叙述中的无监督缩写扩展

Unsupervised Abbreviation Expansion in Clinical Narratives.

作者信息

Oleynik Michel, Kreuzthaler Markus, Schulz Stefan

机构信息

Institute for Medical Informatics, Statistics and Documentation, Medical University of Graz, Austria.

出版信息

Stud Health Technol Inform. 2017;245:539-543.

Abstract

Clinical narratives are typically produced under time pressure, which incites the use of abbreviations and acronyms. To expand such short forms in a correct way eases text comprehension and further semantic processing. We propose a completely unsupervised and data-driven algorithm for the resolution of non-lexicalised and potentially ambiguous abbreviations. Based on the lookup of word bigrams and unigrams extracted from a corpus of 30,000 pseudonymised cardiology reports in German, our method achieved an F1 score of 0.91, evaluated with a test set of 200 text excerpts. The results are statistically significantly better (p < 0.001) than a baseline approach and show that a simple and domain-independent strategy may be enough to resolve abbreviations when a large corpus of similar texts is available. Further work is needed to combine this strategy with sentence and abbreviation detection modules, to adapt it to acronym resolution and to evaluate it with different datasets.

摘要

临床叙述通常是在时间压力下生成的,这促使人们使用缩写和首字母缩略词。以正确的方式展开这些简短形式有助于文本理解和进一步的语义处理。我们提出了一种完全无监督且数据驱动的算法,用于解决非词汇化且可能有歧义的缩写。基于从30000份德语假名化心脏病学报告语料库中提取的单词二元组和一元组的查找,我们的方法在200个文本摘录的测试集上评估,F1分数达到了0.91。结果在统计学上显著优于基线方法(p < 0.001),表明当有大量相似文本语料库时,一种简单且与领域无关的策略可能足以解决缩写问题。需要进一步开展工作,将该策略与句子和缩写检测模块相结合,使其适用于首字母缩略词解析,并使用不同的数据集进行评估。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验