Synchronization is a crucial issue for multi-threaded programs. Mutex locks are widely used in legacy programs and are still popular for the intuition semantics. The SW26010 architecture, deployed on the supercomputer Sunway TaihuLight, introduces a hardware-supported inter-core message passing mechanism and exposes explicit interfaces for developers to use its fast on-chip network. This emerging architectural feature brings both opportunities and challenges for mutex lock implementation. However, there is still no general lock mechanism, especially designed and optimized for architectures with this new feature. In this article, we propose mLock, a fast lock designed and optimized for architectures that support Explicit inter-core Message Passing (EMP). mLock uses partial cores as lock servers and leverages the fast on-chip network to implement high-performance mutual exclusive locks. In this article, we propose a series of novel techniques to improve the performance of EMP locks. First, we propose the concepts of chaining lock and hierarchical lock to reduce message count and mitigate network congestion. Second, we propose a fair lock approach to improve the fairness of EMP locks. Third, server reusing is introduced to reduce the number of lock servers. We implement and evaluate mLock on an SW26010 processor. Experimental results show that our proposed techniques can improve the performance of EMP locks by up to $16.2times$16.2× over a basic design.