While FastMoE enables distributed MoE model training using PyTorch, it suffers inefficiency because of load imbalance and poor communication performance. Other state-of-the-art systems for MoE, such as GShard from Google and BASE Layers from Facebook, share the same issues.