Understanding Co-Running Behaviors on Integrated CPU/GPU Architectures

Abstract

Architecture designers tend to integrate both CPUs and GPUs on the same chip to deliver energy-efficient designs. It is still an open problem to effectively leverage the advantages of both CPUs and GPUs on integrated architectures. In this work, we port 42 programs in Rodinia, Parboil, and Polybench benchmark suites and analyze the co-running behaviors of these programs on both AMD and Intel integrated architectures. We find that co-running performance is not always better than running the program only with CPUs or GPUs. Among these programs, only eight programs can benefit from the co-running, while 24 programs only using GPUs and seven programs only using CPUs achieve the best performance. The remaining three programs show little performance preference for different devices. Through extensive workload characterization analysis, we find that architecture differences between CPUs and GPUs and limited shared memory bandwidth are two main factors affecting current co-running performance. Since not all the programs can benefit from integrated architectures, we build an automatic decision-tree-based model to help application developers predict the co-running performance for a given CPU-only or GPU-only program. Results show that our model correctly predicts 14 programs out of 15 for evaluated programs. For a co-run friendly program, we further propose a profiling-based method to predict the optimal workload partition ratio between CPUs and GPUs. Results show that our model can achieve 87.7 percent of the optimal performance relative to the best partition. The co-running programs acquired with our method outperform the original CPU-only and GPU-only programs by 34.5 and 20.9 percent respectively.

Publication
IEEE Transactions on Parallel and Distributed Systems
Jidong Zhai
Jidong Zhai
Associate Professor
(特别研究员、博士生导师)
Wenguang Chen
Wenguang Chen
Professor
(教授)