Scaling Graph Traversal to 281 Trillion Edges with 40 Million Cores

Abstract

Graph processing, especially high-performance graph traversal, plays a more and more important role in data analytics. The successor of Sunway TaihuLight, New Sunway, is equipped with nearly 10 PB memory and over 40 million cores, which brings the opportunity to process hundreds of trillions of edges graphs. However, the graph with an unprecedented scale also brings severe performance challenges, including load imbalance, poor locality, and irregular access of graph traversal workload.To address the scalability problem, we propose a novel 3-level degree-aware 1.5D graph partitioning, which benefits from both delegated 1D and 2D partitioning. By delegating extremely heavy vertices globally and other heavy vertices on columns and rows in the processes mesh, we break the scalability wall of previous partitioning methods. Together with sub-iteration direction optimization, core group -aware core subgraph segmenting, and a new on-chip sorting mechanism using RMA, we achieve 180,792 GTEPS on a graph with 281 trillion edges, using 103,912 processors with over 40 million cores, achieving 1.75X performance and 8X capacity compared to the previous state of the art and conforming to the Graph 500 BFS benchmark[14].

Publication
Proceedings of the 27th ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming
Zixuan Ma
Zixuan Ma
Ph.D. Student

mzx

Wenguang Chen
Wenguang Chen
Professor
(教授)