Suppr超能文献

用于医学文档检索的自然语言处理与基于内容的图像分析对比

Natural Language Processing Versus Content-Based Image Analysis for Medical Document Retrieval.

作者信息

Névéol Aurélie, Deserno Thomas M, Darmoni Stéfan J, Güld Mark Oliver, Aronson Alan R

机构信息

U.S. National Library of Medicine, National Institutes of Health, 8600 Rockville Pike, Bethesda, MD 20894. E-mail:

出版信息

J Am Soc Inf Sci Technol. 2008 Sep 18;60(1):123-134. doi: 10.1002/asi.20955.

Abstract

One of the most significant recent advances in health information systems has been the shift from paper to electronic documents. While research on automatic text and image processing has taken separate paths, there is a growing need for joint efforts, particularly for electronic health records and biomedical literature databases. This work aims at comparing text-based versus image-based access to multimodal medical documents using state-of-the-art methods of processing text and image components. A collection of 180 medical documents containing an image accompanied by a short text describing it was divided into training and test sets. Content-based image analysis and natural language processing techniques are applied individually and combined for multimodal document analysis. The evaluation consists of an indexing task and a retrieval task based on the "gold standard" codes manually assigned to corpus documents. The performance of text-based and image-based access, as well as combined document features, is compared. Image analysis proves more adequate for both the indexing and retrieval of the images. In the indexing task, multimodal analysis outperforms both independent image and text analysis. This experiment shows that text describing images can be usefully analyzed in the framework of a hybrid text/image retrieval system.

摘要

健康信息系统最近最重要的进展之一是从纸质文档向电子文档的转变。虽然对自动文本和图像处理的研究各自发展,但越来越需要共同努力,特别是在电子健康记录和生物医学文献数据库方面。这项工作旨在使用最先进的文本和图像组件处理方法,比较基于文本和基于图像的多模态医学文档访问方式。收集了180份包含图像及描述该图像的简短文本的医学文档,并将其分为训练集和测试集。基于内容的图像分析和自然语言处理技术分别应用并结合用于多模态文档分析。评估包括基于手动分配给语料库文档的“黄金标准”代码的索引任务和检索任务。比较了基于文本和基于图像的访问性能以及组合文档特征。图像分析在图像索引和检索方面都更适用。在索引任务中,多模态分析优于独立的图像和文本分析。该实验表明,在混合文本/图像检索系统框架中,可以有效地分析描述图像的文本。

相似文献

8
Integrating image data into biomedical text categorization.将图像数据整合到生物医学文本分类中。
Bioinformatics. 2006 Jul 15;22(14):e446-53. doi: 10.1093/bioinformatics/btl235.

本文引用的文献

1
GoldMiner: a radiology image search engine.黄金矿工:一款放射学图像搜索引擎。
AJR Am J Roentgenol. 2007 Jun;188(6):1475-8. doi: 10.2214/AJR.06.1740.
2
Integrating image data into biomedical text categorization.将图像数据整合到生物医学文本分类中。
Bioinformatics. 2006 Jul 15;22(14):e446-53. doi: 10.1093/bioinformatics/btl235.
5
Health information systems - past, present, future.健康信息系统——过去、现在、未来。
Int J Med Inform. 2006 Mar-Apr;75(3-4):268-81. doi: 10.1016/j.ijmedinf.2005.08.002. Epub 2005 Sep 19.
6
Agreement, the f-measure, and reliability in information retrieval.信息检索中的一致性、F值与可靠性。
J Am Med Inform Assoc. 2005 May-Jun;12(3):296-8. doi: 10.1197/jamia.M1733. Epub 2005 Jan 31.

文献AI研究员

20分钟写一篇综述,助力文献阅读效率提升50倍。

立即体验

用中文搜PubMed

大模型驱动的PubMed中文搜索引擎

马上搜索

文档翻译

学术文献翻译模型,支持多种主流文档格式。

立即体验