Hadoop & Mapreduce Examples

Using Hadoop Mapreduce, First of all, start the Hadoop Cluster using the commands given below.

$HADOOP_HOME/sbin/start-dfs.sh

$HADOOP_HOME/sbin/start-yarn.sh

Check by typing jps in the terminal if all the Nodes are running.

Do you remember in the last article we looked at how a word counter works?

Using Hadoop Mapreduce Let’s implement the above.

You need to create three files.

Reduce.java
Map.java
WordCount.java

Reduce.java

package com.impetus.code.examples.hadoop.mapred.wordcount;

import java.io.IOException;
import java.util.Iterator;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reducer;
import org.apache.hadoop.mapred.Reporter;

public class Reduce extends MapReduceBase implements Reducer<Text, IntWritable, Text, IntWritable>
{
public void reduce(Text key, Iterator<IntWritable> values, OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException
{
int sum = 0;
while (values.hasNext())
{
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}

Map.java

package com.impetus.code.examples.hadoop.mapred.wordcount;

import java.io.IOException;
import java.util.StringTokenizer;

import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.MapReduceBase;
import org.apache.hadoop.mapred.Mapper;
import org.apache.hadoop.mapred.OutputCollector;
import org.apache.hadoop.mapred.Reporter;
public class Map extends MapReduceBase implements Mapper<LongWritable, Text, Text, IntWritable>
{
private final static IntWritable one = new IntWritable(1);

private Text word = new Text();

public void map(LongWritable key, Text value, OutputCollector<Text, IntWritable> output, Reporter reporter)
throws IOException
{
String line = value.toString();
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens())
{
word.set(tokenizer.nextToken());
output.collect(word, one);
}
}
}

WordCount.java

package com.impetus.code.examples.hadoop.mapred.wordcount;

import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapred.FileInputFormat;
import org.apache.hadoop.mapred.FileOutputFormat;
import org.apache.hadoop.mapred.JobClient;
import org.apache.hadoop.mapred.JobConf;
import org.apache.hadoop.mapred.TextInputFormat;
import org.apache.hadoop.mapred.TextOutputFormat;

public class WordCount
{
public static void main(String[] args) throws Exception
{
JobConf conf = new JobConf(WordCount.class);
conf.setJobName("wordcount");

conf.setOutputKeyClass(Text.class);
conf.setOutputValueClass(IntWritable.class);

conf.setMapperClass(Map.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);

conf.setInputFormat(TextInputFormat.class);
conf.setOutputFormat(TextOutputFormat.class);

FileInputFormat.setInputPaths(conf, new Path(args[0]));
FileOutputFormat.setOutputPath(conf, new Path(args[1]));

JobClient.runJob(conf);

}
}

Now you need to compile java files.

There are two ways to compile java files.

mvn clean install

Or run the following command.

javac -d . Map.java Reduce.java WordCount.java

If you used javac -d command then run the following command too.

jar cfm wordcounter.jar Manifest.txt com/impetus/code/examples/hadoop/mapred/wordcount/*.class

Now let’s create an input folder in HDFS.

Hdfs dfs -mkdir ~/wordcount/input

Now we are going to create two input files.

sudo vi input_one

And put the following content inside it.

And another file.

sudo vi input_two

Using the command below move the file to HDFS file system

hdfs dfs -copyFromLocal input_one ~/wordcount/input/

Do the above for both input files.

Now check if both files have been moved.

hdfs dfs -ls ~/wordcount/input/

Using Hadoop Mapreduce Now run the map-reduce using the command given below.

$HADOOP_HOME/bin/hadoop jar wordcounter.jar /input /output

By running the below-given command you will be able to see the output.

bin/hadoop dfs -cat ~/wordcount/output/part-00000

Hadoop MapReduce, Mapreduce examples

Share this article

Steven Roger

Steven Roger is a technology blogger, where he brings complex tech concepts to life with clear, engaging insights. With a passion for IT education and over a decade of industry experience, Steven specializes in demystifying the latest in software development, business analysis, and quality assurance training. His articles provide readers with practical knowledge and tips on upskilling for successful careers in tech.

All Posts