MongoDB – Part 7 – MapReduce

MapReduce is a multi-step programming paradigm, which has been around for 2 decades. The goal of MapReduce is to break down a collection of data, sometimes from different machines on a network. Executing MapReduce jobs in parallel, over multiple machines, will significantly improve processing times.

The concept of MapReduce is inspired by the map and reduce functions in functional programming. When MapReduce was defined, it was as a 3 step process (Map, Shuffle, Reduce), MongoDB have taken this one step further and added a finaliser step. So lets see what’s involved in each step.

Continue reading