Abstract

This paper details the problem landscape that arises from using a general tensor algebra accelerator framework to compute real-world end-to-end machine learning applications. We identify three key challenges for correctness and performance, which include support for tensor reshaping and nonlinear operations, dataflow optimization (kernel fusion, optimal dataflow order), and leveraging sparsity structure. This paper motivates the need to address these problems in the domain-specific language, compiler framework, and architectural design for sparse machine learning. We extended a general tensor algebra compiler and architectural model, the Sparse Abstract Machine, to real-world sparse machine learning models in order to identify the key challenges above.

Article

pdf

Movie

BibTeX

@article{lacouture2023,
  title={Challenges with Hardware-Software Co-design for Sparse Machine Learning on Streaming Dataflow},
  author={Rubens Lacouture and Olivia Hsu and Kunle Olukotun and Fredrik Kjolstad},
  journal={Workshop on Programming Languages and Architecture (PLARCH) co-located with FCRC/ISCA/PLDI 2023},
  year={2023},
  month={June}
}