Kahn Charles E, Rubin Daniel L
Division of Informatics, Department of Radiology, Medical College of Wisconsin, 9200 W. Wisconsin Ave., Milwaukee, WI 53226, USA.
J Am Med Inform Assoc. 2009 May-Jun;16(3):380-6. doi: 10.1197/jamia.M2945. Epub 2009 Mar 4.
We explored automated concept-based indexing of unstructured figure captions to improve retrieval of images from radiology journals.
The MetaMap Transfer program (MMTx) was used to map the text of 84,846 figure captions from 9,004 peer-reviewed, English-language articles to concepts in three controlled vocabularies from the UMLS Metathesaurus, version 2006AA. Sampling procedures were used to estimate the standard information-retrieval metrics of precision and recall, and to evaluate the degree to which concept-based retrieval improved image retrieval.
Precision was estimated based on a sample of 250 concepts. Recall was estimated based on a sample of 40 concepts. The authors measured the impact of concept-based retrieval to improve upon keyword-based retrieval in a random sample of 10,000 search queries issued by users of a radiology image search engine.
Estimated precision was 0.897 (95% confidence interval, 0.857-0.937). Estimated recall was 0.930 (95% confidence interval, 0.838-1.000). In 5,535 of 10,000 search queries (55%), concept-based retrieval found results not identified by simple keyword matching; in 2,086 searches (21%), more than 75% of the results were found by concept-based search alone.
Concept-based indexing of radiology journal figure captions achieved very high precision and recall, and significantly improved image retrieval.
我们探索了对非结构化图像标题进行基于概念的自动索引,以改善从放射学期刊中检索图像的效果。
使用MetaMap Transfer程序(MMTx)将来自9004篇经同行评审的英文文章的84846个图像标题文本映射到2006AA版UMLS元词表的三个受控词汇表中的概念。采用抽样程序来估计标准信息检索指标的精确率和召回率,并评估基于概念的检索在多大程度上改善了图像检索。
基于250个概念的样本估计精确率。基于40个概念的样本估计召回率。作者在放射学图像搜索引擎用户发出的10000个搜索查询的随机样本中,测量了基于概念的检索对改进基于关键词检索的影响。
估计精确率为0.897(95%置信区间,0.857 - 0.937)。估计召回率为0.930(95%置信区间,0.838 - 1.000)。在10000个搜索查询中的5535个(55%)中,基于概念的检索找到了简单关键词匹配未识别的结果;在2086次搜索(21%)中,超过75%的结果仅通过基于概念的搜索找到。
放射学期刊图像标题的基于概念的索引实现了非常高的精确率和召回率,并显著改善了图像检索。