As an extension of the Bayesian network, the module network is used in situations where there are many variables but only a small set of data available. However, using this network is still time-consuming. In this paper, the authors proposed a parallel implementation of the module network, a less time-consuming, learning algorithm based on the message-passing model. In order to solve the load-imbalance problem introduced by either result caching or intrinsic computation, a grouping strategy was proposed, which groups computations by modules and then distributes them cyclically. The algorithm was tested on eight 4-way Intel Xeon multiprocessors. Speedups of 29.26 on 32 processors have been observed. The result shows that our algorithm is effective.