Suppr超能文献

评价生物医学文本的句子表示方法及实验结果。

Evaluating sentence representations for biomedical text: Methods and experimental results.

机构信息

Computer Engineering Department, College of Engineering, Arab Academy for Science, Technology, and Maritime Transport (AAST), 1029 Alexandria, Egypt; Department of Information and Computing Sciences, Utrecht University, 3584 CC Utrecht, the Netherlands.

Department of Information and Computing Sciences, Utrecht University, 3584 CC Utrecht, the Netherlands.

出版信息

J Biomed Inform. 2020 Apr;104:103396. doi: 10.1016/j.jbi.2020.103396. Epub 2020 Mar 6.

Abstract

Text representations ar one of the main inputs to various Natural Language Processing (NLP) methods. Given the fast developmental pace of new sentence embedding methods, we argue that there is a need for a unified methodology to assess these different techniques in the biomedical domain. This work introduces a comprehensive evaluation of novel methods across ten medical classification tasks. The tasks cover a variety of BioNLP problems such as semantic similarity, question answering, citation sentiment analysis and others with binary and multi-class datasets. Our goal is to assess the transferability of different sentence representation schemes to the medical and clinical domain. Our analysis shows that embeddings based on Language Models which account for the context-dependent nature of words, usually outperform others in terms of performance. Nonetheless, there is no single embedding model that perfectly represents biomedical and clinical texts with consistent performance across all tasks. This illustrates the need for a more suitable bio-encoder. Our MedSentEval source code, pre-trained embeddings and examples have been made available on GitHub.

摘要

文本表示是各种自然语言处理 (NLP) 方法的主要输入之一。鉴于新的句子嵌入方法发展迅速,我们认为有必要采用一种统一的方法来评估生物医学领域的这些不同技术。这项工作对十种医学分类任务中的新方法进行了全面评估。这些任务涵盖了各种 BioNLP 问题,如语义相似性、问答、引文情感分析等,具有二进制和多类数据集。我们的目标是评估不同句子表示方案在医学和临床领域的可转移性。我们的分析表明,基于语言模型的嵌入方法考虑了单词的上下文依赖性,通常在性能方面优于其他方法。尽管如此,没有一种嵌入模型能够完美地表示生物医学和临床文本,并且在所有任务中都具有一致的性能。这说明了需要更合适的生物编码器。我们的 MedSentEval 源代码、预训练的嵌入和示例已在 GitHub 上提供。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验