Problem Statement: Find Maximum Temperature for a city from the Input data.
Step 1) Input Files:
File 1:
New-york, 25
Seattle, 21
New-york, 28
Dallas, 35
File 2:
New-york, 20
Seattle, 21
Seattle, 22
Dallas, 23
File 3:
New-york, 31
Seattle, 33
Dallas, 30
Dallas, 19
Step 2: Map Function
Let’s say Map1, Map2 & Map3 run on File1, File2 & File3 in parallel, Here is their output:
(Note how it outputs the “Key – Value” pair. The key would be used by the reduce function later to do a “group by“)
Map 1:
Seattle, 21
New-york, 28
Dallas, 35
Map 2:
New-york, 20
Seattle, 22
Dallas, 23
Map 3:
New-york, 31
Seattle, 33
Dallas, 30
Step 3: Reduce Function
Reduce Function takes the input from Map1, Map2 & Map3, to give an output:
New-york, 31
Seattle, 33
Dallas, 35
Conclusion:
In this post, we visualized MapReduce Programming Model with an example: Finding Max Temp. for a city. And as you can imagine you can extend this post, to visualize:
1) Find Minimum Temperature for a city.
2) In this post, the key was City, But you could substitute it by other relevant real world entity to solve similar looking problems.
I hope this helps.
Related Articles: