学习内窥镜视频检索的语义和视觉相似性。
Learning semantic and visual similarity for endomicroscopy video retrieval.
机构信息
Mauna Kea Technologies, 75010 Paris, France.
出版信息
IEEE Trans Med Imaging. 2012 Jun;31(6):1276-88. doi: 10.1109/TMI.2012.2188301. Epub 2012 Feb 16.
Content-based image retrieval (CBIR) is a valuable computer vision technique which is increasingly being applied in the medical community for diagnosis support. However, traditional CBIR systems only deliver visual outputs, i.e., images having a similar appearance to the query, which is not directly interpretable by the physicians. Our objective is to provide a system for endomicroscopy video retrieval which delivers both visual and semantic outputs that are consistent with each other. In a previous study, we developed an adapted bag-of-visual-words method for endomicroscopy retrieval, called "Dense-Sift," that computes a visual signature for each video. In this paper, we present a novel approach to complement visual similarity learning with semantic knowledge extraction, in the field of in vivo endomicroscopy. We first leverage a semantic ground truth based on eight binary concepts, in order to transform these visual signatures into semantic signatures that reflect how much the presence of each semantic concept is expressed by the visual words describing the videos. Using cross-validation, we demonstrate that, in terms of semantic detection, our intuitive Fisher-based method transforming visual-word histograms into semantic estimations outperforms support vector machine (SVM) methods with statistical significance. In a second step, we propose to improve retrieval relevance by learning an adjusted similarity distance from a perceived similarity ground truth. As a result, our distance learning method allows to statistically improve the correlation with the perceived similarity. We also demonstrate that, in terms of perceived similarity, the recall performance of the semantic signatures is close to that of visual signatures and significantly better than those of several state-of-the-art CBIR methods. The semantic signatures are thus able to communicate high-level medical knowledge while being consistent with the low-level visual signatures and much shorter than them. In our resulting retrieval system, we decide to use visual signatures for perceived similarity learning and retrieval, and semantic signatures for the output of an additional information, expressed in the endoscopist own language, which provides a relevant semantic translation of the visual retrieval outputs.
基于内容的图像检索 (CBIR) 是一种有价值的计算机视觉技术,越来越多地应用于医学领域以支持诊断。然而,传统的 CBIR 系统仅提供视觉输出,即与查询具有相似外观的图像,这对于医生来说是无法直接解释的。我们的目标是提供一种用于内窥镜视频检索的系统,该系统提供相互一致的视觉和语义输出。在之前的研究中,我们开发了一种适用于内窥镜检索的自适应视觉词袋方法,称为“Dense-Sift”,它为每个视频计算一个视觉签名。在本文中,我们提出了一种新颖的方法,通过语义知识提取来补充视觉相似性学习,该方法适用于体内内窥镜领域。我们首先利用基于八个二进制概念的语义真实数据,将这些视觉签名转换为语义签名,以反映描述视频的视觉词对每个语义概念的表达程度。通过交叉验证,我们证明在语义检测方面,我们直观的基于 Fisher 的方法将视觉词直方图转换为语义估计的方法优于具有统计意义的支持向量机 (SVM) 方法。在第二步中,我们提出通过从感知相似性真实数据中学习调整后的相似性距离来提高检索相关性。因此,我们的距离学习方法允许从统计上提高与感知相似性的相关性。我们还证明,在感知相似性方面,语义签名的召回性能接近视觉签名,并且明显优于几种最先进的 CBIR 方法。因此,语义签名能够传达高级医学知识,同时与低级视觉签名保持一致,并且比它们短得多。在我们的检索系统中,我们决定使用视觉签名进行感知相似性学习和检索,使用语义签名作为额外信息的输出,以内窥镜医生自己的语言表达,这提供了视觉检索输出的相关语义翻译。