MixQ

2024/04/10 MLSys

Overview of MixQ

Quantizing outliers is the main challenge when quantizing activation tensors in inferencing. Previous work has shown that the outliers are located in fixed channels. However, few of them identify the regularity of outliers when decoding tokens. Existing open-sourced work also did not achieve the ideal speedup compared to the FP16 baseline. In this project, we show the locality of outliers when decoding tokens and design a mixed-precision kernel to achieve state-of-the-art performance.

AI Compiler Machine Learning Systems

MixQ

Yidong Chen

Postdoc

Jidong Zhai

Professor
(长聘教授、博士生导师)

MixQ

Yidong Chen

Postdoc

Jidong Zhai

Professor (长聘教授、博士生导师)

Professor
(长聘教授、博士生导师)