Beyond Parallelism: Exploring multiple levels of concurrency on a modern GPU
- Yash Ukidave
- Feb 20, 2015
- 1 min read
GPUs have gained tremendous popularity as accelerators for a broad range of applications belonging to various computing domains. Many applications have achieved large performance gains using the inherent parallelism offered by GPU architectures. Given the growing impact of GPU computing, there is a growing need for efficient utilization of compute resources and increased application throughput. Applications developed for modern GPUs include multiple compute kernels, where each kernel exhibits a distinct computational behavior and resource requirements. These applications place high resource demands on the hardware, commonly impose timing constraints, and demand concurrent execution of multiple kernels on the device. The growing use of GPUs in cloud engines, data centers, and smart devices demands an effective GPU sharing technique for multiple application contexts. An advanced mechanism is required to support the concurrent and flexible execution of multi-kernel applications. At the same time, such support has to be extended to schedule multiple application contexts on the same GPU. In our work, we explore the multiple levels of concurrency on a GPU, and suggest architectural and runtime enhancements to leverage the concurrency. We implement a dynamic, and adaptive mechanism to manage multi-level concurrency to improve the overall application throughput on the GPU.
Comments