Suppr超能文献

语言嵌入分析在非结构化病理报告分类中的应用。

Analysis of Language Embeddings for Classification of Unstructured Pathology Reports.

出版信息

Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:2378-2381. doi: 10.1109/EMBC46164.2021.9630347.

Abstract

A pathology report is one of the most significant medical documents providing interpretive insights into the visual appearance of the patient's biopsy sample. In digital pathology, high-resolution images of tissue samples are stored along with pathology reports. Despite the valuable information that pathology reports hold, they are not used in any systematic manner to promote computational pathology. In this work, we focus on analyzing the reports, which are generally unstructured documents written in English with sophisticated and highly specialized medical terminology. We provide a comparative analysis of various embedding models like BioBERT, Clinical BioBERT, BioMed-RoBERTa and Term Frequency-Inverse Document Frequency (TF-IDF), a traditional NLP technique, as well as the combination of embeddings from pre-trained models with TF-IDF. Our results demonstrate the effectiveness of various word embedding techniques for pathology reports.

摘要

病理报告是最重要的医疗文件之一,为解读患者活检样本的外观提供了依据。在数字病理学中,组织样本的高分辨率图像与病理报告一起存储。尽管病理报告包含有价值的信息,但它们并没有以任何系统的方式用于促进计算病理学。在这项工作中,我们专注于分析报告,这些报告通常是用英语编写的非结构化文档,其中包含复杂和高度专业化的医学术语。我们对各种嵌入模型(如 BioBERT、Clinical BioBERT、BioMed-RoBERTa 和 Term Frequency-Inverse Document Frequency(TF-IDF),一种传统的自然语言处理技术)以及预训练模型的嵌入与 TF-IDF 的组合进行了比较分析。我们的结果表明,各种单词嵌入技术对病理报告是有效的。

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验