Abstract: The deployment of Machine Learning (ML) applications extensively leverages Matrix Multiplication (MM) operations on modern and advanced accelerators, like Graphic Processing Units (GPUs), ...