Abstract

Onyx is a system-on-chip (SoC) with a coarse-grained reconfigurable array (CGRA) for accelerating sparse and dense tensor algebra and dense image processing and machine learning (ML) applications. To support multiple inputs, multiple dimensions, and fusion in sparse applications, Onyx utilizes composable memory primitives that operate on compressed storage and streams and compute primitives that eliminate unnecessary calculations. Onyx also improves performance on dense applications with application-specialized processing elements (PEs), area-optimized memory tiles, and hybrid clock gating in the global buffer (GLB). Onyx achieves a peak energy efficiency of 756 INT16 GOPS/W, up to 565× better energy-delay product (EDP) for sparse kernels versus CPUs with sparse libraries, and up to 76% and 85% lower EDP for image processing and ML, respectively, versus the state-of-the-art CGRA.

Article

Article URL

Article

pdf

Article Note

The above PDF is the author-submitted accepted version of the article. The final published version can be found at the Article URL above.

BibTeX

@ARTICLE{11150697, 
author={Koul, Kalhan and Hsu, Olivia and Mei, Yuchen and
Gautham Ravipati, Sai and Strange, Maxwell and Melchert, Jackson and Carsello,
Alex and Kong, Taeyoung and Chen, Po-Han and Ke, Huifeng and Zhang, Keyi and
Liu, Qiaoyi and Nyengele, Gedeon and Xie, Zhouhua and Balasingam, Akhilesh and
Adivarahan, Jayashree and Sharma, Ritvik and Torng, Christopher and Emer, Joel
S. and Kjolstad, Fredrik and Horowitz, Mark and Raina, Priyanka}, journal={IEEE
Journal of Solid-State Circuits}, title={Onyx: A 12-nm Programmable Accelerator
for Dense and Sparse Applications}, year={2025}, volume={}, number={},
pages={1-13}, keywords={Tensors;Optical fiber
networks;Algebra;Kernel;Micromechanical devices;Integrated circuit
interconnections;Registers;System-on-chip;Repeaters;Machine
learning;Coarse-grained reconfigurable array (CGRA);compilers;computer
vision;image processing;machine learning (ML);reconfigurable
accelerators;sparse matrices}, doi={10.1109/JSSC.2025.3604724}}