• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

相似文献

1
Efficient Execution of Microscopy Image Analysis on CPU, GPU, and MIC Equipped Cluster Systems.在配备CPU、GPU和MIC的集群系统上高效执行显微镜图像分析
Proc Symp Comput Archit High Perform Comput. 2014 Oct;2014:89-96. doi: 10.1109/SBAC-PAD.2014.15.
2
Accelerating Large Scale Image Analyses on Parallel, CPU-GPU Equipped Systems.在配备了CPU - GPU的并行系统上加速大规模图像分析
Proc IPDPS (Conf). 2012 May;2012:1093-1104. doi: 10.1109/IPDPS.2012.101.
3
Application Performance Analysis and Efficient Execution on Systems with multi-core CPUs, GPUs and MICs: A Case Study with Microscopy Image Analysis.多核CPU、GPU和MIC系统上的应用性能分析与高效执行:以显微镜图像分析为例
Int J High Perform Comput Appl. 2017 Jan;31(1):32-51. doi: 10.1177/1094342015594519. Epub 2015 Jul 27.
4
High-throughput Analysis of Large Microscopy Image Datasets on CPU-GPU Cluster Platforms.在CPU-GPU集群平台上对大型显微镜图像数据集进行高通量分析。
Proc IPDPS (Conf). 2013 May;2013:103-114. doi: 10.1109/IPDPS.2013.11.
5
Comparative Performance Analysis of Intel Xeon Phi, GPU, and CPU: A Case Study from Microscopy Image Analysis.英特尔至强融核处理器、图形处理器和中央处理器的性能对比分析:来自显微镜图像分析的案例研究
IEEE Trans Parallel Distrib Syst. 2014 May;2014:1063-1072. doi: 10.1109/IPDPS.2014.111.
6
Cooperative and out-of-core execution of the irregular wavefront propagation pattern on hybrid machines with Intel Xeon Phi™.在配备英特尔至强融核™的混合机上对不规则波前传播模式进行协同和核外执行。
Concurr Comput. 2018 Jul 25;30(14). doi: 10.1002/cpe.4425. Epub 2018 Jan 24.
7
Efficient Irregular Wavefront Propagation Algorithms on Hybrid CPU-GPU Machines.混合CPU-GPU机器上的高效不规则波前传播算法
Parallel Comput. 2013 Apr 1;39(4-5):189-211. doi: 10.1016/j.parco.2013.03.001.
8
Hybrid computing: CPU+GPU co-processing and its application to tomographic reconstruction.混合计算:CPU+GPU 协同处理及其在断层重建中的应用。
Ultramicroscopy. 2012 Apr;115:109-14. doi: 10.1016/j.ultramic.2012.02.003. Epub 2012 Feb 18.
9
Region Templates: Data Representation and Management for High-Throughput Image Analysis.区域模板:用于高通量图像分析的数据表示与管理
Parallel Comput. 2014 Dec 1;40(10):589-610. doi: 10.1016/j.parco.2014.09.003.
10
Accelerating Coupled-Cluster Calculations with GPUs: An Implementation of the Density-Fitted CCSD(T) Approach for Heterogeneous Computing Architectures Using OpenMP Directives.利用GPU加速耦合簇计算:一种使用OpenMP指令在异构计算架构上实现密度拟合CCSD(T)方法的方案
J Chem Theory Comput. 2023 Nov 14;19(21):7640-7657. doi: 10.1021/acs.jctc.3c00876. Epub 2023 Oct 25.

引用本文的文献

1
Artificial Intelligence and Digital Pathology: Challenges and Opportunities.人工智能与数字病理学:挑战与机遇
J Pathol Inform. 2018 Nov 14;9:38. doi: 10.4103/jpi.jpi_53_18. eCollection 2018.

本文引用的文献

1
High-throughput Analysis of Large Microscopy Image Datasets on CPU-GPU Cluster Platforms.在CPU-GPU集群平台上对大型显微镜图像数据集进行高通量分析。
Proc IPDPS (Conf). 2013 May;2013:103-114. doi: 10.1109/IPDPS.2013.11.
2
Comparative Performance Analysis of Intel Xeon Phi, GPU, and CPU: A Case Study from Microscopy Image Analysis.英特尔至强融核处理器、图形处理器和中央处理器的性能对比分析:来自显微镜图像分析的案例研究
IEEE Trans Parallel Distrib Syst. 2014 May;2014:1063-1072. doi: 10.1109/IPDPS.2014.111.
3
Efficient Irregular Wavefront Propagation Algorithms on Hybrid CPU-GPU Machines.混合CPU-GPU机器上的高效不规则波前传播算法
Parallel Comput. 2013 Apr 1;39(4-5):189-211. doi: 10.1016/j.parco.2013.03.001.
4
An integrative approach for in silico glioma research.基于计算的脑胶质瘤研究的综合方法。
IEEE Trans Biomed Eng. 2010 Oct;57(10):2617-21. doi: 10.1109/TBME.2010.2060338. Epub 2010 Jul 23.

在配备CPU、GPU和MIC的集群系统上高效执行显微镜图像分析

Efficient Execution of Microscopy Image Analysis on CPU, GPU, and MIC Equipped Cluster Systems.

作者信息

Andrade G, Ferreira R, Teodoro George, Rocha Leonardo, Saltz Joel H, Kurc Tahsin

机构信息

Federal University of Minas Gerais.

University of Brasília.

出版信息

Proc Symp Comput Archit High Perform Comput. 2014 Oct;2014:89-96. doi: 10.1109/SBAC-PAD.2014.15.

DOI:10.1109/SBAC-PAD.2014.15
PMID:26640423
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC4670037/
Abstract

High performance computing is experiencing a major paradigm shift with the introduction of accelerators, such as graphics processing units (GPUs) and Intel Xeon Phi (MIC). These processors have made available a tremendous computing power at low cost, and are transforming machines into hybrid systems equipped with CPUs and accelerators. Although these systems can deliver a very high peak performance, making full use of its resources in real-world applications is a complex problem. Most current applications deployed to these machines are still being executed in a single processor, leaving other devices underutilized. In this paper we explore a scenario in which applications are composed of hierarchical data flow tasks which are allocated to nodes of a distributed memory machine in coarse-grain, but each of them may be composed of several finer-grain tasks which can be allocated to different devices within the node. We propose and implement novel performance aware scheduling techniques that can be used to allocate tasks to devices. We evaluate our techniques using a pathology image analysis application used to investigate brain cancer morphology, and our experimental evaluation shows that the proposed scheduling strategies significantly outperforms other efficient scheduling techniques, such as Heterogeneous Earliest Finish Time - HEFT, in cooperative executions using CPUs, GPUs, and MICs. We also experimentally show that our strategies are less sensitive to inaccuracy in the scheduling input data and that the performance gains are maintained as the application scales.

摘要

随着图形处理单元(GPU)和英特尔至强融核(MIC)等加速器的引入,高性能计算正在经历重大的范式转变。这些处理器以低成本提供了巨大的计算能力,并正在将机器转变为配备CPU和加速器的混合系统。尽管这些系统可以提供非常高的峰值性能,但在实际应用中充分利用其资源是一个复杂的问题。当前部署到这些机器上的大多数应用程序仍在单个处理器上执行,导致其他设备未得到充分利用。在本文中,我们探讨了一种场景,即应用程序由分层数据流任务组成,这些任务以粗粒度分配到分布式内存机器的节点,但每个任务可能由几个更细粒度的任务组成,这些细粒度任务可以分配到节点内的不同设备。我们提出并实现了新颖的性能感知调度技术,可用于将任务分配到设备。我们使用用于研究脑癌形态的病理图像分析应用程序评估我们的技术,我们的实验评估表明,在使用CPU、GPU和MIC的协同执行中,所提出的调度策略明显优于其他高效调度技术,如异构最早完成时间(HEFT)。我们还通过实验表明,我们的策略对调度输入数据中的不准确性不太敏感,并且随着应用程序规模的扩大,性能提升得以保持。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf17/4670037/fc0754e105ba/nihms700836f10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf17/4670037/e4f154fda157/nihms700836f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf17/4670037/2c954faccb4d/nihms700836f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf17/4670037/9c1d69ec431a/nihms700836f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf17/4670037/d7cae17c5bca/nihms700836f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf17/4670037/e56e90aba967/nihms700836f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf17/4670037/b6fee6b48480/nihms700836f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf17/4670037/dfe3b9c4ebc8/nihms700836f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf17/4670037/2b6ae7cb8185/nihms700836f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf17/4670037/2b27e72b8d79/nihms700836f9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf17/4670037/fc0754e105ba/nihms700836f10.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf17/4670037/e4f154fda157/nihms700836f1.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf17/4670037/2c954faccb4d/nihms700836f2.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf17/4670037/9c1d69ec431a/nihms700836f3.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf17/4670037/d7cae17c5bca/nihms700836f4.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf17/4670037/e56e90aba967/nihms700836f5.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf17/4670037/b6fee6b48480/nihms700836f6.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf17/4670037/dfe3b9c4ebc8/nihms700836f7.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf17/4670037/2b6ae7cb8185/nihms700836f8.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf17/4670037/2b27e72b8d79/nihms700836f9.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/bf17/4670037/fc0754e105ba/nihms700836f10.jpg