• 文献检索
  • 文档翻译
  • 深度研究
  • 学术资讯
  • Suppr Zotero 插件Zotero 插件
  • 邀请有礼
  • 套餐&价格
  • 历史记录
应用&插件
Suppr Zotero 插件Zotero 插件浏览器插件Mac 客户端Windows 客户端微信小程序
定价
高级版会员购买积分包购买API积分包
服务
文献检索文档翻译深度研究API 文档MCP 服务
关于我们
关于 Suppr公司介绍联系我们用户协议隐私条款
关注我们

Suppr 超能文献

核心技术专利:CN118964589B侵权必究
粤ICP备2023148730 号-1Suppr @ 2026

文献检索

告别复杂PubMed语法,用中文像聊天一样搜索,搜遍4000万医学文献。AI智能推荐,让科研检索更轻松。

立即免费搜索

文件翻译

保留排版,准确专业,支持PDF/Word/PPT等文件格式,支持 12+语言互译。

免费翻译文档

深度研究

AI帮你快速写综述,25分钟生成高质量综述,智能提取关键信息,辅助科研写作。

立即免费体验

大规模并行系统中的动态内存管理:以图形处理器为例

Dynamic Memory Management in Massively Parallel Systems: A Case on GPUs.

作者信息

Pham Minh, Li Hao, Yuan Yongke, Mou Chengcheng, Ramachandran Kandethody, Xu Zichen, Tu Yicheng

机构信息

University of South Florida, Tampa, FL, USA.

Beijing University of Technology, Beijing, China.

出版信息

ICS. 2022 Jun;2022. doi: 10.1145/3524059.3532387. Epub 2022 Jun 28.

DOI:10.1145/3524059.3532387
PMID:35943281
原文链接:https://pmc.ncbi.nlm.nih.gov/articles/PMC9357265/
Abstract

Due to the high level of parallelism, there are unique challenges in developing system software on massively parallel hardware such as GPUs. One such challenge is designing a dynamic memory allocator whose task is to allocate memory chunks to requesting threads at runtime. State-of-the-art GPU memory allocators maintain a global data structure holding metadata to facilitate allocation/deallocation. However, the centralized data structure can easily become a bottleneck in a massively parallel system. In this paper, we present a novel approach for designing dynamic memory allocation without a centralized data structure. The core idea is to let threads follow a random search procedure to locate free pages. Then we further extend to more advanced designs and algorithms that can achieve an order of magnitude improvement over the basic idea. We present mathematical proofs to demonstrate that (1) the basic random search design achieves asymptotically lower latency than the traditional queue-based design and (2) the advanced designs achieve significant improvement over the basic idea. Extensive experiments show consistency to our mathematical models and demonstrate that our solutions can achieve up to two orders of magnitude improvement in latency over the best-known existing solutions.

摘要

由于高度的并行性,在诸如GPU等大规模并行硬件上开发系统软件存在独特的挑战。其中一个挑战是设计一个动态内存分配器,其任务是在运行时为请求线程分配内存块。最先进的GPU内存分配器维护一个保存元数据的全局数据结构,以方便内存分配/释放。然而,集中式数据结构在大规模并行系统中很容易成为瓶颈。在本文中,我们提出了一种无需集中式数据结构来设计动态内存分配的新颖方法。核心思想是让线程遵循随机搜索过程来定位空闲页面。然后我们进一步扩展到更先进的设计和算法,相对于基本思想可实现数量级的提升。我们给出数学证明以表明:(1)基本的随机搜索设计比传统的基于队列设计具有渐近更低的延迟;(2)先进设计相对于基本思想有显著改进。大量实验表明与我们的数学模型一致,并证明我们的解决方案在延迟方面相对于最知名的现有解决方案可实现高达两个数量级的提升。

https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ca/9357265/1e6d9f230b46/nihms-1823919-f0018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ca/9357265/977828a1c0e4/nihms-1823919-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ca/9357265/d63e13644220/nihms-1823919-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ca/9357265/0a75dbd1b906/nihms-1823919-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ca/9357265/8faf650f8dd1/nihms-1823919-f0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ca/9357265/0e1a3e37b73e/nihms-1823919-f0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ca/9357265/b96a0f11b52e/nihms-1823919-f0011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ca/9357265/db16372b1217/nihms-1823919-f0012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ca/9357265/272b7a646af7/nihms-1823919-f0013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ca/9357265/24a4f4341ee5/nihms-1823919-f0014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ca/9357265/3c9d6f69afc8/nihms-1823919-f0015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ca/9357265/317821f693b1/nihms-1823919-f0016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ca/9357265/7e6bbb6732f2/nihms-1823919-f0017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ca/9357265/1e6d9f230b46/nihms-1823919-f0018.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ca/9357265/977828a1c0e4/nihms-1823919-f0006.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ca/9357265/d63e13644220/nihms-1823919-f0007.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ca/9357265/0a75dbd1b906/nihms-1823919-f0008.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ca/9357265/8faf650f8dd1/nihms-1823919-f0009.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ca/9357265/0e1a3e37b73e/nihms-1823919-f0010.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ca/9357265/b96a0f11b52e/nihms-1823919-f0011.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ca/9357265/db16372b1217/nihms-1823919-f0012.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ca/9357265/272b7a646af7/nihms-1823919-f0013.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ca/9357265/24a4f4341ee5/nihms-1823919-f0014.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ca/9357265/3c9d6f69afc8/nihms-1823919-f0015.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ca/9357265/317821f693b1/nihms-1823919-f0016.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ca/9357265/7e6bbb6732f2/nihms-1823919-f0017.jpg
https://cdn.ncbi.nlm.nih.gov/pmc/blobs/c3ca/9357265/1e6d9f230b46/nihms-1823919-f0018.jpg

相似文献

1
Dynamic Memory Management in Massively Parallel Systems: A Case on GPUs.大规模并行系统中的动态内存管理:以图形处理器为例
ICS. 2022 Jun;2022. doi: 10.1145/3524059.3532387. Epub 2022 Jun 28.
2
Efficient Join Algorithms For Large Database Tables in a Multi-GPU Environment.多GPU环境下大型数据库表的高效连接算法
Proceedings VLDB Endowment. 2020 Dec;14(4):708-720. doi: 10.14778/3436905.3436927. Epub 2020 Dec 1.
3
CUDAMPF: a multi-tiered parallel framework for accelerating protein sequence search in HMMER on CUDA-enabled GPU.CUDAMPF:一种用于在支持CUDA的GPU上加速HMMER中蛋白质序列搜索的多层并行框架。
BMC Bioinformatics. 2016 Feb 27;17:106. doi: 10.1186/s12859-016-0946-4.
4
cuRnet: an R package for graph traversing on GPU.cuRnet:一个在 GPU 上进行图遍历的 R 包。
BMC Bioinformatics. 2018 Oct 15;19(Suppl 10):356. doi: 10.1186/s12859-018-2310-3.
5
WFA-GPU: gap-affine pairwise read-alignment using GPUs.WFA-GPU:基于 GPU 的缺口仿射两两序列比对
Bioinformatics. 2023 Dec 1;39(12). doi: 10.1093/bioinformatics/btad701.
6
Fast Equi-Join Algorithms on GPUs: Design and Implementation.基于图形处理器的快速等值连接算法:设计与实现
Sci Stat Database Manag. 2017 Jun;2017. doi: 10.1145/3085504.3085521. Epub 2017 Jun 27.
7
Constructing Neuronal Network Models in Massively Parallel Environments.在大规模并行环境中构建神经网络模型。
Front Neuroinform. 2017 May 16;11:30. doi: 10.3389/fninf.2017.00030. eCollection 2017.
8
Efficient parallel implementation of active appearance model fitting algorithm on GPU.
ScientificWorldJournal. 2014 Mar 2;2014:528080. doi: 10.1155/2014/528080. eCollection 2014.
9
Streaming parallel GPU acceleration of large-scale filter-based spiking neural networks.基于滤波器的大规模尖峰神经网络的流式并行 GPU 加速。
Network. 2012;23(4):183-211. doi: 10.3109/0954898X.2012.733842. Epub 2012 Oct 25.
10
mpdcm: A toolbox for massively parallel dynamic causal modeling.mpdcm:用于大规模并行动态因果建模的工具箱。
J Neurosci Methods. 2016 Jan 15;257:7-16. doi: 10.1016/j.jneumeth.2015.09.009. Epub 2015 Sep 16.

引用本文的文献

1
Dynamic Buffer Management in Massively Parallel Systems: The Power of Randomness.大规模并行系统中的动态缓冲区管理:随机性的力量
ACM Trans Parallel Comput. 2025 Mar;12(1). doi: 10.1145/3701623. Epub 2025 Feb 11.

本文引用的文献

1
Fast Equi-Join Algorithms on GPUs: Design and Implementation.基于图形处理器的快速等值连接算法:设计与实现
Sci Stat Database Manag. 2017 Jun;2017. doi: 10.1145/3085504.3085521. Epub 2017 Jun 27.
2
Performance Modeling in CUDA Streams - A Means for High-Throughput Data Processing.CUDA流中的性能建模——一种实现高吞吐量数据处理的方法。
Proc IEEE Int Conf Big Data. 2014 Oct;2014:301-310. doi: 10.1109/BigData.2014.7004245.
3
InChIKey collision resistance: an experimental testing.InChIKey 抗冲突性:实验测试。
J Cheminform. 2012 Dec 20;4(1):39. doi: 10.1186/1758-2946-4-39.
4
Toward using confidence intervals to compare correlations.关于使用置信区间比较相关性。
Psychol Methods. 2007 Dec;12(4):399-413. doi: 10.1037/1082-989X.12.4.399.