top of page

Revisiting Accelerator-Based CMPs: Challenges and Solutions


Utilizing Hardware Accelerators (ACCs) is a promising solution to improve performance / power efficiency of Chip Multi-Processors (CMPs). However, new challenges including scalability arise with a trend to shift from few ACCs (with sparse ACCs coverage) to many ACCs (denser ACCs coverage) on a chip. Resources including memory, communication fabric and processor turn into bottlenecks and result in accelerator under-utilization and cripple the performance. The source of this challenge is a lack of clear semantic to communicate with ACCs as well as a processor-centric view for orchestrating the entire system. To open a path toward efficient integration of many ACCs on a single chip, at first we identify 4 major semantic aspects when two ACCs need to communicate with each other: (1) data access model, (2) data granularity, (3) marshalling, and (4) synchronization. Then, we propose Transparent Self-Synchronizing (TSS) as an efficient architecture realization of those semantic aspects. In principle, TSS proposes a shift from the current processor-centric view to a more equal, peer view between ACCs and the host processors. TSS minimizes the interaction with the host processor and reduces the volume of ACC-to-ACC communication traffic exposed to the system fabric. Our results using some streaming applications with a variable number of ACC-to-ACC connections demonstrate significant benefits of TSS including 3x speedup over the current ACC based architectures.

bottom of page