Li Hao, Yu Di, Kumar Anand, Tu Yi-Cheng
Proc IEEE Int Conf Big Data. 2014 Oct;2014:301-310. doi: 10.1109/BigData.2014.7004245.
Push-based database management system (DBMS) is a new type of data processing software that streams large volume of data to concurrent query operators. The high data rate of such systems requires large computing power provided by the query engine. In our previous work, we built a push-based DBMS named G-SDMS to harness the unrivaled computational capabilities of modern GPUs. A major design goal of G-SDMS is to support concurrent processing of heterogenous query processing operations and enable resource allocation among such operations. Understanding the performance of operations as a result of resource consumption is thus a premise in the design of G-SDMS. With NVIDIA's CUDA framework as the system implementation platform, we present our recent work on performance modeling of CUDA kernels running concurrently under a runtime mechanism named . Specifically, we explore the connection between performance and resource occupancy of compute-bound kernels and develop a model that can predict the performance of such kernels. Furthermore, we provide an in-depth anatomy of the CUDA stream mechanism and summarize the main kernel scheduling disciplines in it. Our models and derived scheduling disciplines are verified by extensive experiments using synthetic and real-world CUDA kernels.
基于推送的数据库管理系统(DBMS)是一种新型数据处理软件,它将大量数据流传输到并发查询运算符。此类系统的高数据速率需要查询引擎提供强大的计算能力。在我们之前的工作中,我们构建了一个名为G-SDMS的基于推送的DBMS,以利用现代GPU无与伦比的计算能力。G-SDMS的一个主要设计目标是支持异构查询处理操作的并发处理,并在此类操作之间进行资源分配。因此,了解由于资源消耗导致的操作性能是G-SDMS设计的前提。以NVIDIA的CUDA框架作为系统实现平台,我们展示了我们最近在一种名为的运行时机制下对并发运行的CUDA内核进行性能建模的工作。具体来说,我们探索了计算密集型内核的性能与资源占用之间的联系,并开发了一个可以预测此类内核性能的模型。此外,我们对CUDA流机制进行了深入剖析,并总结了其中主要的内核调度规则。我们的模型和推导的调度规则通过使用合成和实际CUDA内核的大量实验得到了验证。