• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

用于多图形处理单元卡的块匹配算法的计算统一设备架构实现

Compute-unified device architecture implementation of a block-matching algorithm for multiple graphical processing unit cards.

作者信息

Massanes Francesc, Cadennes Marie, Brankov Jovan G

机构信息

Illinois Institute of Technology, Medical Imaging Research Center, Chicago IL 60616, USA.

出版信息

J Electron Imaging. 2011 Jul;20(3). doi: 10.1117/1.3606588.

DOI:10.1117/1.3606588
PMID:22347787
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC3280822/
Abstract

In this paper we describe and evaluate a fast implementation of a classical block matching motion estimation algorithm for multiple Graphical Processing Units (GPUs) using the Compute Unified Device Architecture (CUDA) computing engine. The implemented block matching algorithm (BMA) uses summed absolute difference (SAD) error criterion and full grid search (FS) for finding optimal block displacement. In this evaluation we compared the execution time of a GPU and CPU implementation for images of various sizes, using integer and non-integer search grids.The results show that use of a GPU card can shorten computation time by a factor of 200 times for integer and 1000 times for a non-integer search grid. The additional speedup for non-integer search grid comes from the fact that GPU has built-in hardware for image interpolation. Further, when using multiple GPU cards, the presented evaluation shows the importance of the data splitting method across multiple cards, but an almost linear speedup with a number of cards is achievable.In addition we compared execution time of the proposed FS GPU implementation with two existing, highly optimized non-full grid search CPU based motion estimations methods, namely implementation of the Pyramidal Lucas Kanade Optical flow algorithm in OpenCV and Simplified Unsymmetrical multi-Hexagon search in H.264/AVC standard. In these comparisons, FS GPU implementation still showed modest improvement even though the computational complexity of FS GPU implementation is substantially higher than non-FS CPU implementation.We also demonstrated that for an image sequence of 720×480 pixels in resolution, commonly used in video surveillance, the proposed GPU implementation is sufficiently fast for real-time motion estimation at 30 frames-per-second using two NVIDIA C1060 Tesla GPU cards.

摘要

在本文中,我们描述并评估了一种使用计算统一设备架构(CUDA)计算引擎在多个图形处理单元(GPU)上快速实现经典块匹配运动估计算法的方法。所实现的块匹配算法(BMA)使用绝对差和(SAD)误差准则以及全网格搜索(FS)来找到最佳块位移。在本评估中,我们使用整数和非整数搜索网格,比较了GPU和CPU实现对于各种尺寸图像的执行时间。结果表明,使用GPU卡对于整数搜索网格可将计算时间缩短200倍,对于非整数搜索网格可缩短1000倍。非整数搜索网格的额外加速来自于GPU具有用于图像插值的内置硬件这一事实。此外,当使用多个GPU卡时,所呈现的评估显示了跨多个卡的数据拆分方法的重要性,但随着卡数量的增加几乎可以实现线性加速。此外,我们将所提出的FS GPU实现的执行时间与两种现有的、高度优化的基于非全网格搜索CPU的运动估计方法进行了比较,即OpenCV中金字塔卢卡斯·卡纳德光流算法的实现以及H.264/AVC标准中的简化非对称多六边形搜索。在这些比较中,尽管FS GPU实现的计算复杂度明显高于非FS CPU实现,但FS GPU实现仍显示出适度的改进。我们还证明,对于视频监控中常用的分辨率为720×480像素的图像序列,使用两块NVIDIA C1060 Tesla GPU卡,所提出的GPU实现对于30帧每秒的实时运动估计足够快。

相似文献

1
Compute-unified device architecture implementation of a block-matching algorithm for multiple graphical processing unit cards.用于多图形处理单元卡的块匹配算法的计算统一设备架构实现
J Electron Imaging. 2011 Jul;20(3). doi: 10.1117/1.3606588.
2
A fast forward projection using multithreads for multirays on GPUs in medical image reconstruction.基于 GPU 的医学图像重建中多线程快速前向投影的多射线算法。
Med Phys. 2011 Jul;38(7):4052-65. doi: 10.1118/1.3591994.
3
Fully 3D list-mode time-of-flight PET image reconstruction on GPUs using CUDA.基于 CUDA 的 GPU 上完全 3D 列表模式飞行时间 PET 图像重建。
Med Phys. 2011 Dec;38(12):6775-86. doi: 10.1118/1.3661998.
4
High performance computing for deformable image registration: towards a new paradigm in adaptive radiotherapy.用于可变形图像配准的高性能计算:迈向自适应放射治疗的新范式。
Med Phys. 2008 Aug;35(8):3546-53. doi: 10.1118/1.2948318.
5
Efficient methods for implementation of multi-level nonrigid mass-preserving image registration on GPUs and multi-threaded CPUs.在图形处理器(GPU)和多线程中央处理器(CPU)上实现多级非刚性质量守恒图像配准的高效方法。
Comput Methods Programs Biomed. 2016 Apr;127:290-300. doi: 10.1016/j.cmpb.2015.12.018. Epub 2016 Jan 6.
6
Real-Time Lung Tumor Tracking Using a CUDA Enabled Nonrigid Registration Algorithm for MRI.使用支持CUDA的MRI非刚性配准算法进行实时肺肿瘤跟踪
IEEE J Transl Eng Health Med. 2020 Apr 24;8:4300308. doi: 10.1109/JTEHM.2020.2989124. eCollection 2020.
7
Ultra-fast digital tomosynthesis reconstruction using general-purpose GPU programming for image-guided radiation therapy.基于通用 GPU 编程的用于图像引导放射治疗的超快速数字断层合成重建。
Technol Cancer Res Treat. 2011 Aug;10(4):295-306. doi: 10.7785/tcrt.2012.500206.
8
CUDASW++ 3.0: accelerating Smith-Waterman protein database search by coupling CPU and GPU SIMD instructions.CUDASW++ 3.0:通过结合 CPU 和 GPU 的 SIMD 指令来加速 Smith-Waterman 蛋白质数据库搜索。
BMC Bioinformatics. 2013 Apr 4;14:117. doi: 10.1186/1471-2105-14-117.
9
NMF-mGPU: non-negative matrix factorization on multi-GPU systems.NMF-mGPU:多GPU系统上的非负矩阵分解
BMC Bioinformatics. 2015 Feb 13;16:43. doi: 10.1186/s12859-015-0485-4.
10
Accelerating Spatial Cross-Matching on CPU-GPU Hybrid Platform With CUDA and OpenACC.利用CUDA和OpenACC在CPU-GPU混合平台上加速空间交叉匹配
Front Big Data. 2020 May;3. doi: 10.3389/fdata.2020.00014. Epub 2020 May 8.

引用本文的文献

1
Block Matching Pyramid Algorithm-Based Analysis on Efficacy of Shexiang Baoxin Pills Guided by Echocardiogram (ECG) on Patients with Angina Pectoris in Coronary Heart Disease.基于子波匹配金字塔算法的麝香保心丸治疗冠心病心绞痛患者的超声心动图(ECG)疗效分析。
J Healthc Eng. 2021 Aug 6;2021:3819900. doi: 10.1155/2021/3819900. eCollection 2021.
2
Computing global minimizers to a constrained B-spline image registration problem from optimal l1 perturbations to block match data.从最优l1扰动到块匹配数据计算约束B样条图像配准问题的全局极小值。
Med Phys. 2014 Apr;41(4):041904. doi: 10.1118/1.4866891.

本文引用的文献

1
Deformable left-ventricle mesh model for motion-compensated filtering in cardiac gated SPECT.用于心脏门控 SPECT 中运动补偿滤波的可变形左心室网格模型。
Med Phys. 2010 Oct;37(10):5471-81. doi: 10.1118/1.3483098.
2
Simplified electroholographic color reconstruction system using graphics processing unit and liquid crystal display projector.使用图形处理单元和液晶显示器投影仪的简化电子全息彩色重建系统。
Opt Express. 2009 Aug 31;17(18):16038-45. doi: 10.1364/OE.17.016038.
3
Motion compensation in digital subtraction angiography using graphics hardware.
Comput Med Imaging Graph. 2006 Jul;30(5):279-89. doi: 10.1016/j.compmedimag.2006.05.008. Epub 2006 Aug 14.