Nondeterminism in MapReduce Considered Harmful? An Empirical Study on Non-Commutative Aggregators in MapReduce Programs

Abstract

The simplicity of MapReduce introduces unique subtleties that cause hard-to-detect bugs; in particular, the unfixed order of reduce function input is a source of nondeterminism that is harmful if the reduce function is not commutative and sensitive to input order. Our extensive study of production MapReduce programs reveals interesting findings on commutativity, nondeterminism, and correctness. Although non-commutative reduce functions lead to five bugs in our sample of well-tested production programs, we surprisingly have found that many non-commutative reduce functions are mostly harmless due to, for example, implicit data properties. These findings are instrumental in advancing our understanding of MapReduce program correctness.

Publication
Companion Proceedings of the 36th International Conference on Software Engineering
Wenguang Chen
Wenguang Chen
Professor
(教授)