Abstract

We introduce Mosaic, a sparse tensor algebra compiler that can bind tensor (sub-)expressions to external functions of other tensor algebra libraries or compilers. Users can extend Mosaic by adding new functions and can bind a sub-computation to a function using a scheduling API. Mosaic substitutes the bound(sub-)expressions with the specified function call and automatically fills in the rest of the unbound code using a default code generator. Mosaic also has a search system that can automatically map an expression to a set of registered external functions. Both the explicit binding and automatic search are verified by Mosaic. We demonstrate the benefits of our approach by showing that calling hand-written CPU and specialized hardware functions can provide speedup of up to 206x and 173x, respectively, over a homogeneous compiler. Mosaic’s external function interface is simple and general. Currently, 38 external functions have been added to Mosaic, with each addition averaging 20 lines of C++ code.

Article

pdf

Movie

BibTeX

 
@article{10.1145/3591236,
author = {Bansal, Manya and Hsu, Olivia and Olukotun, Kunle and Kjolstad, Fredrik},
title = {Mosaic: An Interoperable Compiler for Tensor Algebra},
year = {2023},
issue_date = {June 2023},
publisher = {Association for Computing Machinery},
address = {New York, NY, USA},
volume = {7},
number = {PLDI},
url = {https://doi.org/10.1145/3591236},
doi = {10.1145/3591236},
abstract = {We introduce Mosaic, a sparse tensor algebra compiler that can bind
tensor expressions to external functions of other tensor algebra libraries and
compilers. Users can extend Mosaic by adding new functions and bind a
sub-expression to a function using a scheduling API. Mosaic substitutes the
bound sub-expressions with calls to the external functions and automatically
generates the remaining code using a default code generator. As the generated
code is fused by default, users can productively leverage both fusion and calls
to specialized functions within the same compiler. We demonstrate the benefits
of our dual approach by showing that calling hand-written CPU and specialized
hardware functions can provide speedups of up to 206x against fused
code in some cases, while generating fused code can provide speedups of up to
3.57x against code that calls external functions in other cases.
Mosaic also offers a search system that can automatically map an expression to
a set of registered external functions. Both the explicit binding and automatic
search are verified by Mosaic. Additionally, the interface for adding new
external functions is simple and general. Currently, 38 external functions have
been added to Mosaic, with each addition averaging 20 lines of code.},
journal = {Proc. ACM Program. Lang.},
month = {jun},
articleno = {122},
numpages = {26},
keywords = {sparse tensor algebra, compilation, external functions, automated search} }