Olivia Hsu

Abstract

Onyx is a system-on-chip (SoC) with a coarse-grained reconfigurable array (CGRA) for accelerating sparse and dense tensor algebra and dense image processing and machine learning (ML) applications. To support multiple inputs, multiple dimensions, and fusion in sparse applications, Onyx utilizes composable memory primitives that operate on compressed storage and streams and compute primitives that eliminate unnecessary calculations. Onyx also improves performance on dense applications with application-specialized processing elements (PEs), area-optimized memory tiles, and hybrid clock gating in the global buffer (GLB). Onyx achieves a peak energy efficiency of 756 INT16 GOPS/W, up to 565× better energy-delay product (EDP) for sparse kernels versus CPUs with sparse libraries, and up to 76% and 85% lower EDP for image processing and ML, respectively, versus the state-of-the-art CGRA.

Article

Article URL

Article

Article Note

The above PDF is the author-submitted accepted version of the article. The final published version can be found at the Article URL above.

BibTeX


      @ARTICLE{11150697, 
  author={Koul, Kalhan and Hsu, Olivia and Mei, Yuchen and Gautham Ravipati, Sai and Strange, Maxwell and Melchert, Jackson and Carsello, Alex and Kong, Taeyoung and Chen, Po-Han and Ke, Huifeng and Zhang, Keyi and Liu, Qiaoyi and Nyengele, Gedeon and Xie, Zhouhua and Balasingam, Akhilesh and Adivarahan, Jayashree and Sharma, Ritvik and Torng, Christopher and Emer, Joel S. and Kjolstad, Fredrik and Horowitz, Mark and Raina, Priyanka}, 
  journal={IEEE Journal of Solid-State Circuits}, 
  title={Onyx: A 12-nm Programmable Accelerator for Dense and Sparse Applications}, 
  year={2025}, 
  volume={}, 
  number={}, 
  pages={1-13}, 
  keywords={Tensors;Optical fiber networks;Algebra;Kernel;Micromechanical devices;Integrated circuit interconnections;Registers;System-on-chip;Repeaters;Machine learning;Coarse-grained reconfigurable array (CGRA);compilers;computer vision;image processing;machine learning (ML);reconfigurable accelerators;sparse matrices}, 
  doi={10.1109/JSSC.2025.3604724}}