Sparker: Efficient Reduction for More Scalable Machine Learning with Spark

Abstract

Machine learning applications on Spark suffers from poor scalability. In this paper, we reveal that the key reasons is the non-scalable reduction, which is restricted by the non-splittable object programming interface in Spark. This insight guides us to propose Sparker, Spark with Efficient Reduction. By providing a split aggregation interface, Sparker is able to perform split aggregation with scalable reduction while being backward compatible with existing applications. We implemented Sparker in 2,534 lines of code. Sparker can improve the aggregation performance by up to 6.47 × and can improve the end-to-end performance of MLlib model training by up to 3.69 × with a geometric mean of 1.81 × .

Publication
50th International Conference on Parallel Processing
Wenguang Chen
Wenguang Chen
Professor
(教授)