A Survey on Accelerated Technologies for Mixture-of-Experts Model Training Systems

January, 2026

Abstract

Mixture-of-Experts (MoE) models have emerged as a transformative paradigm for scaling Large Language Models (LLMs), enabling unprecedented model capacity while maintaining computational efficiency through sparse activation mechanisms. However, the unique architectural characteristics of MoE models introduce significant system-level challenges that fundamentally differ from traditional dense models. These challenges necessitate specialized system optimizations tailored to MoE’s distinctive properties. This survey systematically analyzes accelerated technologies for MoE training systems, discussing recent advances across four critical optimization dimensions: hybrid parallel computing, comprehensive memory management, fine-grained communication scheduling, and adaptive load balancing. Our analysis reveals a paradigm shift from computation-centric to workload-centric optimization strategies. What’s more, we identify emerging research directions including machine learning-guided load balancing, cross-layer optimization frameworks, and hardware-software co-design for MoE training workloads. This work aims to provide researchers and system engineers with a comprehensive technical reference to support the design of more efficient and scalable next-generation MoE training systems.

Type

Journal article

Publication

Tsinghua Science and Technology

A Survey on Accelerated Technologies for Mixture-of-Experts Model Training Systems

Abstract

Qi Zhang

Postdoc

Jidong Zhai

Professor
(长聘教授、博士生导师)

A Survey on Accelerated Technologies for Mixture-of-Experts Model Training Systems

Abstract

Qi Zhang

Postdoc

Jidong Zhai

Professor (长聘教授、博士生导师)

Professor
(长聘教授、博士生导师)