Dokulil Jiri, Benkner Siegfried
Faculty of Computer Science, University of Vienna, Vienna, Austria.
J Supercomput. 2022;78(10):12344-12379. doi: 10.1007/s11227-022-04355-0. Epub 2022 Mar 2.
Task-based runtime systems are an important branch of parallel programming research, since tasks decouple computation from the compute units, giving the runtime systems greater flexibility than a thread-based solution. This makes it easier to deal with the ever-increasing complexity of parallel architectures by providing a separation of concerns-the specification of parallelism is separated from the implementation of the parallel computations on a specific architecture. The Open Community Runtime is one such system, aimed at large-scale parallel systems. Unlike many other task-based runtime systems, the creators not only provided an implementation but there is also a comprehensive specification document. This has allowed us to create an independent implementation, called OCR-Vx. In this article, we present our experience of developing the runtime system, put our work in the context of the specification and the other implementations, and describe key lessons that we have learned during our work. We discuss the design and implementation issues of task-based runtime systems and applications including task synchronization and scheduling, data management, memory consistency, the relation between shared-memory and distributed-memory runtime systems, NUMA architectures, and heterogeneous systems. The article is aimed at audiences not familiar with OCR, since we believe these lessons could be valuable for developers working on other task-based runtime systems or designing new ones.
基于任务的运行时系统是并行编程研究的一个重要分支,因为任务将计算与计算单元解耦,使得运行时系统比基于线程的解决方案具有更大的灵活性。通过提供关注点分离——并行性的规范与在特定架构上的并行计算实现相分离,这使得处理并行架构日益增加的复杂性变得更加容易。开放社区运行时就是这样一个针对大规模并行系统的系统。与许多其他基于任务的运行时系统不同,其创建者不仅提供了一个实现,还有一份全面的规范文档。这使我们能够创建一个名为OCR-Vx的独立实现。在本文中,我们介绍开发该运行时系统的经验,将我们的工作置于规范和其他实现的背景下,并描述我们在工作中吸取的关键经验教训。我们讨论基于任务的运行时系统及应用的设计和实现问题,包括任务同步与调度、数据管理、内存一致性、共享内存和分布式内存运行时系统之间的关系、非统一内存访问(NUMA)架构以及异构系统。本文面向不熟悉OCR的读者,因为我们认为这些经验教训对致力于其他基于任务的运行时系统或设计新系统的开发者可能有价值。