Hadoop MapReduce Join & Counter with Example

Sometimes we need to combine two large datasets for this purpose MapReduce provides join operation. If we try to do the join manually, it requires a lot of code. MapReduce provides easy functionality, MapReduce Join and Counter having Two datasets are compared for size, and a smaller dataset is distributed to every DataNode. Then, The Reducer or Mapper uses the smaller dataset and manages it to perform lookup operations to find records. Lastly, the matching records from smaller and large datasets are merged to create the output joined records.

There are two types of joins.

Map-side Join
Reduce-side Join

Map-side Join

In the Map-side Join, the operation is performed by the mapper. Here, the Join is performed before the actual map function could consume the data. This type of Join has the prerequisite that it requires the input given to the map to be in the form of a partition, and all such inputs should be in the sorted order. The joining key must sort the equal sections.

Reduce-side Join

In the Reduce-side Join, the operation is performed by the reducer. In reduce-side join, the dataset is not expected to be in the form of structure. The map side joins processing produces the join key and the associated similar tuples from both of the records. Hence, all the tuples that have the same key group into the same reducer, they are joined to form the output records.

Let’s start with Hadoop first.

First of all, start the Hadoop Cluster using the commands given below.

$HADOOP_HOME/sbin/start-dfs.sh

Hadoop MapReduce Join & Counter with Example

$HADOOP_HOME/sbin/start-yarn.sh

Hadoop MapReduce Join & Counter with Example

Check by typing jps in the terminal if all the Nodes are running.

Hadoop MapReduce Join & Counter with Example

We have the following data

Hadoop MapReduce Join & Counter with Example

Hadoop MapReduce Join & Counter with Example

Download the Github repo from the link given below. We will be using those files.

https://github.com/mrcreamio/Hadoop-tutorials

Move the downloaded file to the respective repository using the command given below.

sudo cp -r /home/ahmed/Desktop/MapReduceJoin /home/supper_user/

Move to the respective directory.

cd MapReduceJoin/

Hadoop MapReduce Join & Counter with Example

Now let’s copy our input files to the HDFS.

hdfs dfs -copyFromLocal DeptStrength.txt DeptName.txt /

Hadoop MapReduce Join & Counter with Example

Let’s check if we have the files copied.

hdfs dfs -ls /

Hadoop MapReduce Join & Counter with Example

Run the program using the command given below.

$HADOOP_HOME/bin/hadoop jar MapReduceJoin.jar /DeptStrength.txt /DeptName.txt /output_mapreducejoin

Hadoop MapReduce Join & Counter with Example

Let’s see the output files using the command given below.

Hadoop MapReduce Join & Counter with Example

Here is the output.

hdfs dfs -cat /output_mapreducejoin/part-00000

Hadoop MapReduce Join & Counter with Example

Hadoop MapReduce, Map-side Join, Reduce-side Join

Share this article

Steven Roger

Steven Roger is a technology blogger, where he brings complex tech concepts to life with clear, engaging insights. With a passion for IT education and over a decade of industry experience, Steven specializes in demystifying the latest in software development, business analysis, and quality assurance training. His articles provide readers with practical knowledge and tips on upskilling for successful careers in tech.

All Posts

big-data

A Brief Overview Of Hadoop Online Training

Who is Eligible for Big Data’s Hadoop Training

Who is Eligible for Big Data’s Hadoop Training?

Which Has a Better Future, Big Data, or Data Science

Which Has a Better Future, Big Data, or Data Science?

How to learn big data

How to Learn Big Data?

Is Hadoop Certification Training the Key for Bagging High Paying Jobs

Is Hadoop Certification Training the Key for Bagging High Paying Jobs?

Big Data

Name

Phone

Email

Course

- QA Testing Online Training Course

- Business Analyst Online Training with Certification

- Agile Scrum Master Certification Course

- Selenium Online Training with Certification

- Python Certification Course

- Java Full Stack Developer

- Data Science using Python Online Training

- Microsoft .NET Training Online

- Big Data/Hadoop Training

- Tableau Training Online With Certification

- Artificial Intelligence Training

- Salesforce Administrator Certification Training

- Azure DevOps Certification Training

- TOSCA Automation Tool Training

- QA Tester Training with Real Time Project Experience

- AWS Certified Solutions Architect

- Agile Methodology Training Course

- Machine Learning

- Data Science and Machine Learning

- RPA Certification Course

- Business Process And Management

- Ruby Cucumber Training

- Time Management Skills Training

- Kubernetes Training

- LoadRunner Training

- Project Management Training

- Mobile Apps Testing Training

- Microsoft Office

- Core Java with JUnit Testing

- Database Testing Training

- Devops Online Training

- Appium Automation Testing

- Effective Communication Skills

- AngularJS Training

- Devops for QA Tester Training

- Advanced ETL Testing Training

- Informatica Training

- SAS Programmer Training

- HP QTP / UFT Training

- Data Science: Real-time Exercises

- ETL Testing Training

- Data Science and Big Data

- Soft Skills Training

- Certified Software Quality Manager

- Image Management Training

- ISTQB Training

- Salesforce Real-Time Project with Experience

- Cassandra Training

- Web Services Testing / SoapUI

- PowerBI Online Training Course

- SQL Online Training Course

- Teradata SQL Online Certification Training

- Cyber Security Training Online

- Digital Marketing Online Course with Placement

AI courses

Which AI courses provide the best value and career opportunities in 2026?

data analyst

How to Become a Data Analyst in the USA (2026 Guide)

Learn AI Data Analytics

Why Should I Learn AI Data Analytics With H2K Infosys?

best online ai programs

What are the best online AI programs to learn artificial intelligence?

Playwright Training

Which Online Playwright Training in the USA Has Job Support?

Business Analyst Interview Questions

What Are Technical Business Analyst Interview Questions?

cyber security training

How Do I Choose the Best Cyber Security Training Provider in the USA?

real-world testing

How can I gain real-world testing experience as a beginner?

data analytics classes online

Can data analytics classes online help me get a job in USA?

online AI programs in the usa

Everything You Need to Know About Online AI Programs in the USA for International Students

Name

Phone

Email

Course

AI courses

Which AI courses provide the best value and career opportunities in 2026?

July 14, 2026

data analyst

How to Become a Data Analyst in the USA (2026 Guide)

July 14, 2026

Learn AI Data Analytics

Why Should I Learn AI Data Analytics With H2K Infosys?

July 14, 2026

best online ai programs

What are the best online AI programs to learn artificial intelligence?

July 13, 2026

Playwright Training

Which Online Playwright Training in the USA Has Job Support?

July 13, 2026

Business Analyst Interview Questions

What Are Technical Business Analyst Interview Questions?

July 13, 2026

cyber security training

How Do I Choose the Best Cyber Security Training Provider in the USA?

July 13, 2026

real-world testing

How can I gain real-world testing experience as a beginner?

July 13, 2026

Leave a Reply Cancel reply

This site uses Akismet to reduce spam. Learn how your comment data is processed.