Abstract

Decoupled Access-Execute (DAE) architectures separate memory accesses from computation in two specialized units. This design is becoming increasingly popular among hyperscalers to accelerate irregular embedding lookups in recommendation models. In this paper, we first broaden the scope by demonstrating the benefits of DAE architectures across a wider range of irregular embedding operations in several machine learning models. Then, we propose the Ember compiler to automatically compile all of these embedding operations to DAE architectures. Conversely from other DAE compilers, Ember features multiple intermediate representations specifically designed for different optimization levels. In this way, Ember can implement all optimizations to match the performance of hand-written code, unlocking the full potential of DAE architectures at scale.

Article

Article URL

BibTeX

@article{siracusa-marco2026,
  title={Ember: A Compiler for Embedding Operations on Decoupled Access-Execute Architectures},
  author={Marco Siracusa and Olivia Hsu and Victor Soria-Pardos and Joshua Randall and Arnaud Grasset and Eric Biscondi and Doug Joseph and Randy Allen and Fredrik Kjolstad and Miquel Moreto Planas and Adria Armejach},
  journal={to appear in International Symposium on Code Generation and Optimization (CGO)},
  year={2026},
  month={}
}