MatFactory: A Framework for High-Performance Matrix Factorization on FPGAs

Abstract

Matrix factorization is a widely used powerful tool in signal processing, machine learning and high performance computing. For accelerating matrix factorization, FPGAs are suitable platforms, as they can build wide and deep pipelines with favorable power efficiency. Factorizing matrices on FPGAs is thus desirable; however, there is no infrastructure on FPGAs for matrix factorization so far, as it involves several challenges: applicability and scalability of the circuit, pipelining of irregular computing patterns, and effective data caching given the limited memory bandwidth.We propose MatFactory, a novel framework that enables fast development of high-performance algorithms for factorizing matrices on FPGAs. We extract common key operators out of various factorization algorithms, and provide a convenient streaming interface that explicitly moves and manages data through the memory hierarchy. With the interface support, the operators can be easily reused as building blocks and composed together into diverse in-BRAM non-blocked factorization algorithms as well as in-DRAM blocked factorization algorithms. We evaluate MatFactory with three typical algorithms (Cholesky, LU and QR) on Intel A10 FPGA. Our non-blocked factorization achieves 4.0–10.7× speedup over Vitis Library on Xilinx Alveo U280 FPGA, and the blocked implementation further achieves 1.65–1.88× performance compared to the non-blocked version. This is the first framework that systematically designs and accommodates various matrix factorization algorithms for FPGAs, to the best of our knowledge, and it can be easily extended to support more LAPACK routines in general.

Publication
Proceedings of the 43rd IEEE/ACM International Conference on Computer-Aided Design
Wenguang Chen
Wenguang Chen
Professor
(教授)