{"id":4738,"date":"2020-09-04T14:52:46","date_gmt":"2020-09-04T09:22:46","guid":{"rendered":"https:\/\/www.h2kinfosys.com\/blog\/?p=4738"},"modified":"2023-05-17T13:40:09","modified_gmt":"2023-05-17T08:10:09","slug":"hadoop-mapreduce-examples","status":"publish","type":"post","link":"https:\/\/www.h2kinfosys.com\/blog\/hadoop-mapreduce-examples\/","title":{"rendered":"Hadoop &#038; Mapreduce Examples"},"content":{"rendered":"\n<p>Using Hadoop Mapreduce, First of all, start the <a href=\"https:\/\/www.h2kinfosys.com\/courses\/hadoop-bigdata-online-training-course-details\">Hadoop<\/a> Cluster using the commands given below.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-background\" style=\"background-color:#e7f5fe\"><tbody><tr><td>$HADOOP_HOME\/sbin\/start-dfs.sh<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/pHFiVj-aAlBa4IxfWbiXwTEsywOJ27vOqxulSN4NR6t7Mfa1JCYBzMF5twCZGsXaRBM6-BVZUKyuOY7CE3ha76CmKifD1LwId5eoTR42EF2Z7WcZe0ox-XzCjZ74NTPeuDXja1Ia-yI7VbeG0g\" alt=\"\" title=\"\"><\/figure>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-background\" style=\"background-color:#e7f5fe\"><tbody><tr><td>$HADOOP_HOME\/sbin\/start-yarn.sh<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/Huu-Bok8XD3x1hf4WA5-_tEzJqO_ufEDFdyGKlBs9xzRDfPr4Ae5VvpF0Q_L5dJePYYeUBMfzkloSZV39Ly1ZfWF5Wax1hTT7m6L2_6kkIB4WdJlCFa99jMOmEKrz0t_smDPGF-oDQ7oub34Hg\" alt=\"\" title=\"\"><\/figure>\n\n\n\n<p>Check by typing jps in the terminal if all the Nodes are running.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/5-h6Efc7-yYEs1g1fPRi-ZPW2LmuuV2rDCbH_oO1DnETigy-fINQXFnaBLIwwga_j6RNy7OCjTu4L5z9HxtDedFb3RauJMs_quSDZieabwes9-5nlYdJJY7uFXNV6lYrfOVmEVdxM0nKWfhBKQ\" alt=\"\" title=\"\"><\/figure>\n\n\n\n<p>Do you remember in the last article we looked at how a word counter works?<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/A5fftnQTSyp2oGsNRkIxhvZHWTM1zLNO3ILN57dsnyUjpFl-k_-TvnuQV5wbwLJjQGs8TD22tlnHt_TKzt-OTAMuxwv7XSYs_DgCitq3W8g17VU3QQXA1FyY2ygT2R6JkqJaKk8JpIGEX0Rkhw\" alt=\"\" title=\"\"><\/figure>\n\n\n\n<p>Using Hadoop Mapreduce Let\u2019s implement the above.<\/p>\n\n\n\n<p>You need to create three files.<\/p>\n\n\n\n<ul class=\"wp-block-list\">\n<li>Reduce.java<\/li>\n\n\n\n<li>Map.java<\/li>\n\n\n\n<li>WordCount.java<\/li>\n<\/ul>\n\n\n\n<p><strong><em>Reduce.java<\/em><\/strong><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">package com.impetus.code.examples.hadoop.mapred.wordcount;\n\nimport java.io.IOException;\nimport java.util.Iterator;\n\nimport org.apache.hadoop.io.IntWritable;\nimport org.apache.hadoop.io.Text;\nimport org.apache.hadoop.mapred.MapReduceBase;\nimport org.apache.hadoop.mapred.OutputCollector;\nimport org.apache.hadoop.mapred.Reducer;\nimport org.apache.hadoop.mapred.Reporter;\n\npublic class Reduce extends MapReduceBase implements Reducer&lt;Text, IntWritable, Text, IntWritable&gt;\n{\npublic void reduce(Text key, Iterator&lt;IntWritable&gt; values, OutputCollector&lt;Text, IntWritable&gt; output,\nReporter reporter) throws IOException\n{\nint sum = 0;\nwhile (values.hasNext())\n{\nsum += values.next().get();\n}\noutput.collect(key, new IntWritable(sum));\n}\n}<\/pre>\n\n\n\n<p><strong><em>Map.java<\/em><\/strong><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">package com.impetus.code.examples.hadoop.mapred.wordcount;\n\nimport java.io.IOException;\nimport java.util.StringTokenizer;\n\nimport org.apache.hadoop.io.IntWritable;\nimport org.apache.hadoop.io.LongWritable;\nimport org.apache.hadoop.io.Text;\nimport org.apache.hadoop.mapred.MapReduceBase;\nimport org.apache.hadoop.mapred.Mapper;\nimport org.apache.hadoop.mapred.OutputCollector;\nimport org.apache.hadoop.mapred.Reporter;\npublic class Map extends MapReduceBase implements Mapper&lt;LongWritable, Text, Text, IntWritable&gt;\n{\nprivate final static IntWritable one = new IntWritable(1);\n\nprivate Text word = new Text();\n\npublic void map(LongWritable key, Text value, OutputCollector&lt;Text, IntWritable&gt; output, Reporter reporter)\nthrows IOException\n{\nString line = value.toString();\nStringTokenizer tokenizer = new StringTokenizer(line);\nwhile (tokenizer.hasMoreTokens())\n{\nword.set(tokenizer.nextToken());\noutput.collect(word, one);\n}\n}\n}<\/pre>\n\n\n\n<p><strong><em>WordCount.java<\/em><\/strong><\/p>\n\n\n\n<pre class=\"wp-block-preformatted\">package com.impetus.code.examples.hadoop.mapred.wordcount;\n\nimport org.apache.hadoop.fs.Path;\nimport org.apache.hadoop.io.IntWritable;\nimport org.apache.hadoop.io.Text;\nimport org.apache.hadoop.mapred.FileInputFormat;\nimport org.apache.hadoop.mapred.FileOutputFormat;\nimport org.apache.hadoop.mapred.JobClient;\nimport org.apache.hadoop.mapred.JobConf;\nimport org.apache.hadoop.mapred.TextInputFormat;\nimport org.apache.hadoop.mapred.TextOutputFormat;\n\npublic class WordCount\n{\npublic static void main(String[] args) throws Exception\n{\nJobConf conf = new JobConf(WordCount.class);\nconf.setJobName(\"wordcount\");\n\nconf.setOutputKeyClass(Text.class);\nconf.setOutputValueClass(IntWritable.class);\n\nconf.setMapperClass(Map.class);\nconf.setCombinerClass(Reduce.class);\nconf.setReducerClass(Reduce.class);\n\nconf.setInputFormat(TextInputFormat.class);\nconf.setOutputFormat(TextOutputFormat.class);\n\nFileInputFormat.setInputPaths(conf, new Path(args[0]));\nFileOutputFormat.setOutputPath(conf, new Path(args[1]));\n\nJobClient.runJob(conf);\n\n}\n}<\/pre>\n\n\n\n<p>Now you need to compile java files.<\/p>\n\n\n\n<p>There are two ways to compile java files.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-background\" style=\"background-color:#e7f5fe\"><tbody><tr><td>mvn clean install<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/OCCelcWM-2KC2wmcxltJUuELR_1JXrc_DkWYm5AkFJRy6oyQdPUq1DvQhkQEcsIkMKNzys2qO_YWqYNMMFqsbp9ffFzWI15sj4M6ggkZRAap9cKbrGoYn4Ad4wnPD06vAX_2EwhhbXcfbmz-og\" alt=\"\" title=\"\"><\/figure>\n\n\n\n<p>Or run the following command.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-background\" style=\"background-color:#e7f5fe\"><tbody><tr><td>javac -d . Map.java Reduce.java WordCount.java<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/0RkPX16dU-1ChUjsJQLc9-zGhyAG__40LhCi09fe4RXNQs7963GI-fJSKjROJDSijJlfmC-YX-rB7LeOuapRBkmJhGlXj94uZicbXOMauRCJbV3OJunHkMDxI15pPo477KrzLG9CClSs2EjYzg\" alt=\"\" title=\"\"><\/figure>\n\n\n\n<p>If you used javac -d command then run the following command too.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-background\" style=\"background-color:#e7f5fe\"><tbody><tr><td>jar cfm wordcounter.jar Manifest.txt com\/impetus\/code\/examples\/hadoop\/mapred\/wordcount\/*.class<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/IvP-fQM-6dgBd8PB_rc2wOcGcbcIaSdD0DwHYwPAo0HdWMe4Pn8bnTKcZo_fH-piATerz9t3qR0xdnmEzNWP8USZIMriR67RQxxy9f2014Uz7CmA1h_hIId6-Lq2OCvacocwlKR_e4bGWWixfA\" alt=\"\" title=\"\"><\/figure>\n\n\n\n<p>Now let\u2019s create an input folder in HDFS.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-background\" style=\"background-color:#e7f5fe\"><tbody><tr><td>Hdfs dfs -mkdir ~\/wordcount\/input<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Now we are going to create two input files.&nbsp;<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-background\" style=\"background-color:#e7f5fe\"><tbody><tr><td>sudo vi input_one<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>And put the following content inside it.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/TtWdtZcRp7FRKTJDjp1722MdqZitTdntEtVMwN-mh0jCMSHI1e90PpF_uQQtDG0X__Ldkhb2hoif_WqG504w0iqXJr0b72HOkr90G64sppeR_2KT4A9QfbdLnzX4XEIe2nerImJRMJsFphxZWg\" alt=\"\" title=\"\"><\/figure>\n\n\n\n<p>And another file.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-background\" style=\"background-color:#e7f5fe\"><tbody><tr><td>sudo vi input_two<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/6KBjPAF27dQGZHgVWsZUmH0vZhpqxdclqOuSRmgx6GCf8sKxNNUJY4vVYSk_9-cSmWYDGqyYcLgW11a31Ly2_4yo3byROuGt09b9CuxtYTJIDujyUfYUqL913uRyuz5pM5Frr3k5Ep9lIrPpoA\" alt=\"\" title=\"\"><\/figure>\n\n\n\n<p>Using the command below move the file to <a href=\"https:\/\/hadoop.apache.org\/docs\/r1.2.1\/hdfs_design.html\" rel=\"nofollow noopener\" target=\"_blank\">HDFS file system<\/a><\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-background\" style=\"background-color:#e7f5fe\"><tbody><tr><td>hdfs dfs -copyFromLocal input_one ~\/wordcount\/input\/<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/kllKVqsX_-lnaCPVT6LeHS2YdQIs51W144zhQhRoBZFvkcNCDz-H9692w74yborZTAPK-WvYQ6LMCN_3blgeNgcY7Lcz3Isd6feBZfcfp4bnntUttfdpUa7pc201F8ZJeeZyPRlul50wiaO1Jw\" alt=\"\" title=\"\"><\/figure>\n\n\n\n<p>Do the above for both input files.<\/p>\n\n\n\n<p>Now check if both files have been moved.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-background\" style=\"background-color:#e7f5fe\"><tbody><tr><td>hdfs dfs -ls ~\/wordcount\/input\/<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/I9gqRDFL8Vbm1PSw43Y4IjP8szybvOf9qgESjr-zlVgPZtGNQMjbWRfqZmyK0WVAwSRtWFkcglYPWjb2CrbLuIjB1ezcaacifFt8LrQL87m9Jnh8hB3m2jn1lLqjm00vDWrZYmVdoIASWmHGEw\" alt=\"\" title=\"\"><\/figure>\n\n\n\n<p>Using Hadoop Mapreduce Now run the map-reduce using the command given below.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-background\" style=\"background-color:#e7f5fe\"><tbody><tr><td>$HADOOP_HOME\/bin\/hadoop jar wordcounter.jar \/input \/output<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>By running the below-given command you will be able to see the output.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-background\" style=\"background-color:#e7f5fe\"><tbody><tr><td>bin\/hadoop dfs -cat ~\/wordcount\/output\/part-00000<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/Di5CWtz81HIP7swHMWpAjxZ-eW-PSIgyjFXHkYGIixnGOh0PYm1LPeob8Ne3E_4BVupGW3wAeF17rALGyDHADufoDmh4VKEf6NuZ_I1fPCrH94btxa-6l1U1sDP9efLHCPS4x7GdNpLXXp8gHQ\" alt=\"\" title=\"\"><\/figure>\n","protected":false},"excerpt":{"rendered":"<p>Using Hadoop Mapreduce, First of all, start the Hadoop Cluster using the commands given below. $HADOOP_HOME\/sbin\/start-dfs.sh $HADOOP_HOME\/sbin\/start-yarn.sh Check by typing jps in the terminal if all the Nodes are running. Do you remember in the last article we looked at how a word counter works? Using Hadoop Mapreduce Let\u2019s implement the above. You need to [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":4754,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":""},"categories":[67],"tags":[1329,1332],"class_list":["post-4738","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-hadoop-big-data-skill-test","tag-hadoop-mapreduce","tag-mapreduce-examples"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts\/4738","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/comments?post=4738"}],"version-history":[{"count":0,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts\/4738\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/media\/4754"}],"wp:attachment":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/media?parent=4738"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/categories?post=4738"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/tags?post=4738"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}