Ruiz Antonio, Ujaldon Manuel, Cooper Lee, Huang Kun
Computer Architecture Department, Campus Teatinos, University of Malaga, 29071 Malaga, Spain.
Biomedical Informatics Department, Ohio State University, 333 West 10th Avenue, Columbus, OH 43210, USA.
J Signal Process Syst. 2009 Apr 1;55(1-3):229-250. doi: 10.1007/s11265-008-0208-4.
Microscopic imaging is an important tool for characterizing tissue morphology and pathology. 3D reconstruction and visualization of large sample tissue structure requires registration of large sets of high-resolution images. However, the scale of this problem presents a challenge for automatic registration methods. In this paper we present a novel method for efficient automatic registration using graphics processing units (GPUs) and parallel programming. Comparing a C++ CPU implementation with Compute Unified Device Architecture (CUDA) libraries and pthreads running on GPU we achieve a speed-up factor of up to 4.11× with a single GPU and 6.68× with a GPU pair. We present execution times for a benchmark composed of two sets of large-scale images: mouse placenta (16 × 16 pixels) and breast cancer tumors (23 × 62 pixels). It takes more than 12 hours for the genetic case in C++ to register a typical sample composed of 500 consecutive slides, which was reduced to less than 2 hours using two GPUs, in addition to a very promising scalability for extending those gains easily on a large number of GPUs in a distributed system.
微观成像技术是表征组织形态和病理学特征的重要工具。对大样本组织结构进行三维重建和可视化需要对大量高分辨率图像进行配准。然而,该问题的规模给自动配准方法带来了挑战。在本文中,我们提出了一种使用图形处理单元(GPU)和并行编程进行高效自动配准的新方法。通过将C++ CPU实现与在GPU上运行的统一计算设备架构(CUDA)库和pthreads进行比较,我们在使用单个GPU时实现了高达4.11倍的加速因子,在使用一对GPU时实现了6.68倍的加速因子。我们给出了由两组大规模图像组成的基准测试的执行时间:小鼠胎盘(16×16像素)和乳腺癌肿瘤(23×62像素)。在C++中,遗传算法对由500张连续切片组成的典型样本进行配准需要超过12小时,而使用两个GPU时,这一时间缩短至不到2小时,此外,在分布式系统中,该方法在大量GPU上轻松扩展这些增益方面具有非常可观的可扩展性。