Join Free Demo Class Online

QA Software Testing

Business Analyst Online Training

Selenium Online Training

Agile Scrum Master

Salesforce Administrator Certification

Java Full Stack Developer

Python Certification

AWS Certified Solutions Architect

Artificial Intelligence Training

View All Courses

Previous

Next

Bigdata Hadoop Tutorials

What is MapReduce? How it Works

MapReduce is a way of programming, and we can write a MapReduce program in any language we want. MapReduce is a programming paradigm that allows extensive scalability over thousands of servers in a Hadoop cluster. As the processing component, MapReduce is the center of Apache Hadoop. The term “MapReduce” points to two separate and different tasks that Hadoop programs operate. The first is the map work, which uses a set of data and turns it into another set of data, where particular components are divided into tuples (key-value pairs).

The reduce task takes the map’s output as input and joins those data tuples into a smaller set of tuples.

Benifits of MapReduce

Scalability. Enterprises can work and analyze petabytes of data stored in the Hadoop Distributed File System (HDFS).
Flexibility. Hadoop allows for more convenient access to increased sources of data and different types of data.
Speed. Hadoop can process the data faster using parallel processing and minimal data movement.
Simple. Mapreduce program can be composed in several languages such as Java, C++, and Python.

How MapReduce Works?

To understand the MapReduce working let’s take a simple example of a word counter.

Suppose we have the following words as input.

What is MapReduce? How it Works

What is MapReduce? How it Works

Input Splits:

Input split is dividing the input data into fixed-size pieces say 16 kb or any number set by the administrator. This data is given to the map. In our example, we divided the data into two words.

What is MapReduce? How it Works

Mapping

The first thing in the processing of data in the MapReduce program is Mapping. Divided data is used by mapping function to create an output. In our example, we are trying to count the number of occurrences of words. This mapping will produce a list of (word, freq) as shown in the diagram below.

What is MapReduce? How it Works

Shuffling

Shuffling the data from the mapping phase is used to reorder the same words together. Take a look at the example below.

What is MapReduce? How it Works

Reducing

In reducer, the output after shuffling is aggregated and a single frequency of every word is returned. Actually, this process summarizes/shortens the complete dataset.

What is MapReduce? How it Works

The final output of the program is

Hello	3
to	1
world	2
Hadoop	1

Maps task is to Splitting and Mapping and the Reduce task is to Shuffle and reduce.

Benifits of MapReduce, Input Splits, MapReduce, Shuffling

7 Responses

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Share this article

Read All from Steven Roger

By pressing the Subscribe button, you confirm that you have read our Privacy Policy.

Selenium Python Tips and Tricks for Efficient Test Automation

Selenium Python Tips and Tricks for Efficient Test Automation

October 25, 2024

Software tester interview questions

Frequently Asked Software Tester Interview Questions & How to Answer Them

October 25, 2024

paas iaas saas

Understanding IaaS, PaaS, and SaaS in the Context of Cloud Computing

October 25, 2024

app builder

Learn the Basics of Salesforce Admin and App Builder

October 22, 2024

Salesforce Certified Technical Architect

SECRETS to Passing the Salesforce Certified Technical Architect Exam

October 21, 2024

salesforce job resume

How to List Certifications on Your Salesforce Job Resume: A Comprehensive Guide with Examples

October 18, 2024

Need a Free Demo Class?

Join H2K Infosys IT Online Training

Best Hadoop Certifications: Boost Your Data Skills

Best Hadoop Certifications: Boost Your Data Skills

August 2, 2024

Cracking The Data Engineer Interview

Cracking The Data Engineer Interview

August 1, 2024

Ecosystem & Components of Hadoop

Ecosystem & Components of Hadoop

July 3, 2024

Big Data Career Opportunities in 2024

Big Data Career Opportunities in 2024

June 20, 2024

Who is a Hadoop Developer?

Who is a Hadoop Developer?

May 24, 2024

Who is a Big Data Analyst

Who is a Big Data Analyst

May 16, 2024

Top Big Data Companies in 2024

Top Big Data Companies in 2024

April 16, 2024

Why Learn Big Data in 2024?

Why Learn Big Data in 2024?

April 8, 2024

Is Big Data a Database

Is Big Data a Database

April 4, 2024

Does Dark Data Have Any Worth In The Big Data World

Does Dark Data Have Any Worth In The Big Data World

March 28, 2024

Selenium Python Tips and Tricks for Efficient Test Automation

Selenium Python Tips and Tricks for Efficient Test Automation

October 25, 2024

Software tester interview questions

Frequently Asked Software Tester Interview Questions & How to Answer Them

October 25, 2024

How to Get Salesforce Administrator Certification

June 4, 2024

Salesforce Data Migration

How to Prepare for Salesforce Data Migration in 2024

April 16, 2024

paas iaas saas

Understanding IaaS, PaaS, and SaaS in the Context of Cloud Computing

October 25, 2024

The Future Of Business Analysis In The Age Of Artificial Intelligence (AI)

The Future Of Business Analysis In The Age Of Artificial Intelligence (AI)

April 3, 2024

Data Science Internship Interview Questions and Answers

Data Science Internship Interview Questions and Answers

August 21, 2024

How to Become a Business Analyst in 2024

How to Become a Business Analyst in 2024

January 22, 2024