Mixture-of-Experts (MoE) models have emerged as a transformative paradigm for scaling Large Language Models (LLMs), enabling unprecedented model capacity while maintaining computational efficiency through sparse activation mechanisms. However, the unique architectural characteristics of MoE models introduce significant system-level challenges that fundamentally differ from traditional dense models. These challenges necessitate specialized system optimizations tailored to MoE’s distinctive properties. This survey systematically analyzes accelerated technologies for MoE training systems, discussing recent advances across four critical optimization dimensions: hybrid parallel computing, comprehensive memory management, fine-grained communication scheduling, and adaptive load balancing. Our analysis reveals a paradigm shift from computation-centric to workload-centric optimization strategies. What’s more, we identify emerging research directions including machine learning-guided load balancing, cross-layer optimization frameworks, and hardware-software co-design for MoE training workloads. This work aims to provide researchers and system engineers with a comprehensive technical reference to support the design of more efficient and scalable next-generation MoE training systems.