语言嵌入分析在非结构化病理报告分类中的应用。

Analysis of Language Embeddings for Classification of Unstructured Pathology Reports.

出版信息

Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:2378-2381. doi: 10.1109/EMBC46164.2021.9630347.

DOI:10.1109/EMBC46164.2021.9630347

Abstract

A pathology report is one of the most significant medical documents providing interpretive insights into the visual appearance of the patient's biopsy sample. In digital pathology, high-resolution images of tissue samples are stored along with pathology reports. Despite the valuable information that pathology reports hold, they are not used in any systematic manner to promote computational pathology. In this work, we focus on analyzing the reports, which are generally unstructured documents written in English with sophisticated and highly specialized medical terminology. We provide a comparative analysis of various embedding models like BioBERT, Clinical BioBERT, BioMed-RoBERTa and Term Frequency-Inverse Document Frequency (TF-IDF), a traditional NLP technique, as well as the combination of embeddings from pre-trained models with TF-IDF. Our results demonstrate the effectiveness of various word embedding techniques for pathology reports.

摘要

病理报告是最重要的医疗文件之一，为解读患者活检样本的外观提供了依据。在数字病理学中，组织样本的高分辨率图像与病理报告一起存储。尽管病理报告包含有价值的信息，但它们并没有以任何系统的方式用于促进计算病理学。在这项工作中，我们专注于分析报告，这些报告通常是用英语编写的非结构化文档，其中包含复杂和高度专业化的医学术语。我们对各种嵌入模型（如 BioBERT、Clinical BioBERT、BioMed-RoBERTa 和 Term Frequency-Inverse Document Frequency（TF-IDF），一种传统的自然语言处理技术）以及预训练模型的嵌入与 TF-IDF 的组合进行了比较分析。我们的结果表明，各种单词嵌入技术对病理报告是有效的。

相似文献

Analysis of Language Embeddings for Classification of Unstructured Pathology Reports.语言嵌入分析在非结构化病理报告分类中的应用。

Annu Int Conf IEEE Eng Med Biol Soc. 2021 Nov;2021:2378-2381. doi: 10.1109/EMBC46164.2021.9630347.

A Scalable Natural Language Processing for Inferring BT-RADS Categorization from Unstructured Brain Magnetic Resonance Reports.一种可扩展的自然语言处理方法，用于从非结构化的脑部磁共振报告中推断 BT-RADS 分类。

J Digit Imaging. 2020 Dec;33(6):1393-1400. doi: 10.1007/s10278-020-00350-0.

A comparison of word embeddings for the biomedical natural language processing.生物医学自然语言处理中词嵌入的比较。

J Biomed Inform. 2018 Nov;87:12-20. doi: 10.1016/j.jbi.2018.09.008. Epub 2018 Sep 12.

Text Classification for Clinical Trial Operations: Evaluation and Comparison of Natural Language Processing Techniques.临床试验操作的文本分类：自然语言处理技术的评估与比较。

Ther Innov Regul Sci. 2021 Mar;55(2):447-453. doi: 10.1007/s43441-020-00236-x. Epub 2020 Oct 30.

Automated classification of cancer morphology from Italian pathology reports using Natural Language Processing techniques: A rule-based approach.基于自然语言处理技术的意大利病理报告中癌症形态的自动分类：一种基于规则的方法。

J Biomed Inform. 2021 Apr;116:103712. doi: 10.1016/j.jbi.2021.103712. Epub 2021 Feb 18.

The Impact of Specialized Corpora for Word Embeddings in Natural Langage Understanding.专业语料库对自然语言理解中词嵌入的影响。

Stud Health Technol Inform. 2020 Jun 16;270:432-436. doi: 10.3233/SHTI200197.

Simplifying drug package leaflets written in Spanish by using word embedding.通过词嵌入简化用西班牙语编写的药品说明书

J Biomed Semantics. 2017 Sep 29;8(1):45. doi: 10.1186/s13326-017-0156-7.

Deep learning approach to detection of colonoscopic information from unstructured reports.深度学习方法从非结构化报告中检测结肠镜信息。

BMC Med Inform Decis Mak. 2023 Feb 7;23(1):28. doi: 10.1186/s12911-023-02121-7.

BioBERT: a pre-trained biomedical language representation model for biomedical text mining.BioBERT：一种用于生物医学文本挖掘的预训练生物医学语言表示模型。

Bioinformatics. 2020 Feb 15;36(4):1234-1240. doi: 10.1093/bioinformatics/btz682.

Natural language processing of head CT reports to identify intracranial mass effect: CTIME algorithm.通过头部CT报告的自然语言处理识别颅内占位效应：CTIME算法

Am J Emerg Med. 2022 Jan;51:388-392. doi: 10.1016/j.ajem.2021.11.001. Epub 2021 Nov 9.

引用本文的文献

Using artificial intelligence to develop a measure of orthopaedic treatment success from clinical notes.利用人工智能从临床记录中开发一种衡量骨科治疗成功与否的方法。

Front Digit Health. 2025 Apr 24;7:1523953. doi: 10.3389/fdgth.2025.1523953. eCollection 2025.

Assessing Physician and Patient Agreement on Whether Patient Outcomes Captured in Clinical Progress Notes Reflect Treatment Success: Cross-Sectional Study.评估医生与患者对于临床病程记录中所记录的患者预后是否反映治疗成功的一致性：横断面研究。

J Particip Med. 2025 Jan 23;17:e60263. doi: 10.2196/60263.

TCGA-Reports: A machine-readable pathology report resource for benchmarking text-based AI models.TCGA报告：用于基准测试基于文本的人工智能模型的机器可读病理报告资源。

Patterns (N Y). 2024 Feb 21;5(3):100933. doi: 10.1016/j.patter.2024.100933. eCollection 2024 Mar 8.

文献AI研究员

20分钟写一篇综述，助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型，支持多种主流文档格式。

立即体验

语言嵌入分析在非结构化病理报告分类中的应用。

Analysis of Language Embeddings for Classification of Unstructured Pathology Reports.

出版信息

相似文献

引用本文的文献

文献AI研究员

用中文搜PubMed

文档翻译

Suppr 超能文献

相似文献

引用本文的文献