IEEE Trans Med Imaging. 2015 Feb;34(2):496-506. doi: 10.1109/TMI.2014.2361481. Epub 2014 Oct 9.
Automatic analysis of histopathological images has been widely utilized leveraging computational image-processing methods and modern machine learning techniques. Both computer-aided diagnosis (CAD) and content-based image-retrieval (CBIR) systems have been successfully developed for diagnosis, disease detection, and decision support in this area. Recently, with the ever-increasing amount of annotated medical data, large-scale and data-driven methods have emerged to offer a promise of bridging the semantic gap between images and diagnostic information. In this paper, we focus on developing scalable image-retrieval techniques to cope intelligently with massive histopathological images. Specifically, we present a supervised kernel hashing technique which leverages a small amount of supervised information in learning to compress a 10 000-dimensional image feature vector into only tens of binary bits with the informative signatures preserved. These binary codes are then indexed into a hash table that enables real-time retrieval of images in a large database. Critically, the supervised information is employed to bridge the semantic gap between low-level image features and high-level diagnostic information. We build a scalable image-retrieval framework based on the supervised hashing technique and validate its performance on several thousand histopathological images acquired from breast microscopic tissues. Extensive evaluations are carried out in terms of image classification (i.e., benign versus actionable categorization) and retrieval tests. Our framework achieves about 88.1% classification accuracy as well as promising time efficiency. For example, the framework can execute around 800 queries in only 0.01 s, comparing favorably with other commonly used dimensionality reduction and feature selection methods.
自动分析组织病理学图像已经广泛应用于计算图像处理方法和现代机器学习技术。计算机辅助诊断 (CAD) 和基于内容的图像检索 (CBIR) 系统已经成功开发,用于该领域的诊断、疾病检测和决策支持。最近,随着注释医学数据的不断增加,大规模和数据驱动的方法已经出现,有望弥合图像和诊断信息之间的语义鸿沟。在本文中,我们专注于开发可扩展的图像检索技术,以智能应对大量组织病理学图像。具体来说,我们提出了一种有监督核哈希技术,该技术利用少量监督信息进行学习,将 10000 维图像特征向量压缩为仅数十个二进制位,同时保留有信息的签名。然后,这些二进制代码被索引到哈希表中,以便在大型数据库中实时检索图像。关键是,监督信息用于弥合低水平图像特征和高水平诊断信息之间的语义鸿沟。我们基于有监督哈希技术构建了一个可扩展的图像检索框架,并在从乳腺显微镜组织中获取的数千张组织病理学图像上验证了其性能。从图像分类(即良性与可操作分类)和检索测试方面进行了广泛的评估。我们的框架实现了约 88.1%的分类准确率和有希望的时间效率。例如,该框架可以在仅 0.01 秒内执行约 800 次查询,与其他常用的降维和特征选择方法相比具有优势。