What is MapReduce

What is MapReduce? How it Works

Table of Contents

MapReduce is a way of programming, and we can write a MapReduce program in any language we want. MapReduce is a programming paradigm that allows extensive scalability over thousands of servers in a Hadoop cluster. As the processing component, MapReduce is the center of Apache Hadoop. The term “MapReduce” points to two separate and different tasks that Hadoop programs operate. The first is the map work, which uses a set of data and turns it into another set of data, where particular components are divided into tuples (key-value pairs).

The reduce task takes the map’s output as input and joins those data tuples into a smaller set of tuples.

Benifits of MapReduce 

  • Scalability. Enterprises can work and analyze petabytes of data stored in the Hadoop Distributed File System (HDFS).
  • Flexibility. Hadoop allows for more convenient access to increased sources of data and different types of data.
  • Speed. Hadoop can process the data faster using parallel processing and minimal data movement.
  • Simple. Mapreduce program can be composed in several languages such as Java, C++, and Python.

How MapReduce Works?

To understand the MapReduce working let’s take a simple example of a word counter.

Suppose we have the following words as input.

What is MapReduce? How it Works
What is MapReduce? How it Works

Input Splits:

Input split is dividing the input data into fixed-size pieces say 16 kb or any number set by the administrator. This data is given to the map. In our example, we divided the data into two words.

What is MapReduce? How it Works

Mapping

The first thing in the processing of data in the MapReduce program is Mapping. Divided data is used by mapping function to create an output. In our example, we are trying to count the number of occurrences of words. This mapping will produce a list of (word, freq) as shown in the diagram below.

What is MapReduce? How it Works

Shuffling

Shuffling the data from the mapping phase is used to reorder the same words together. Take a look at the example below.

What is MapReduce? How it Works

Reducing

In reducer, the output after shuffling is aggregated and a single frequency of every word is returned. Actually, this process summarizes/shortens the complete dataset.

What is MapReduce? How it Works

The final output of the program is 

Hello 3
to1
world2
Hadoop1

Maps task is to Splitting and Mapping and the Reduce task is to Shuffle and reduce.

Share this article