{"id":4368,"date":"2020-08-14T17:35:53","date_gmt":"2020-08-14T12:05:53","guid":{"rendered":"https:\/\/www.h2kinfosys.com\/blog\/?p=4368"},"modified":"2020-08-21T14:35:35","modified_gmt":"2020-08-21T09:05:35","slug":"what-is-hadoop-an-introduction","status":"publish","type":"post","link":"https:\/\/www.h2kinfosys.com\/blog\/what-is-hadoop-an-introduction\/","title":{"rendered":"What is Hadoop? An Introduction"},"content":{"rendered":"\n<figure class=\"wp-block-image size-large\"><img fetchpriority=\"high\" decoding=\"async\" width=\"568\" height=\"221\" src=\"https:\/\/www.h2kinfosys.com\/blog\/wp-content\/uploads\/2020\/08\/image.png\" alt=\"\" class=\"wp-image-4375\" title=\"\" srcset=\"https:\/\/www.h2kinfosys.com\/blog\/wp-content\/uploads\/2020\/08\/image.png 568w, https:\/\/www.h2kinfosys.com\/blog\/wp-content\/uploads\/2020\/08\/image-300x117.png 300w\" sizes=\"(max-width: 568px) 100vw, 568px\" \/><\/figure>\n\n\n\n<p>In this article, you will learn an introduction to Hadoop and how this open-source framework utilized to create data processing using a simple programming model<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">What is Hadoop?<\/h2>\n\n\n\n<p>Apache Hadoop is an open-source framework utilized to create data processing applications using a simple programming model which are executed in a distributed computing environment.<\/p>\n\n\n\n<p>It is an <a href=\"https:\/\/en.wikipedia.org\/wiki\/Category:Data_management_software\" rel=\"nofollow noopener\" target=\"_blank\">open-source Data management<\/a> with distributing processing and scale-out storage.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Hadoop ecosystem and components<\/h2>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"624\" height=\"395\" src=\"https:\/\/www.h2kinfosys.com\/blog\/wp-content\/uploads\/2020\/08\/image-1.png\" alt=\"Hadoop ecosystem and components\" class=\"wp-image-4376\" title=\"\" srcset=\"https:\/\/www.h2kinfosys.com\/blog\/wp-content\/uploads\/2020\/08\/image-1.png 624w, https:\/\/www.h2kinfosys.com\/blog\/wp-content\/uploads\/2020\/08\/image-1-300x190.png 300w\" sizes=\"(max-width: 624px) 100vw, 624px\" \/><\/figure>\n\n\n\n<p>In this introduction to Hadoop Framework\u2019s success has pushed developers to create multiple software. Hadoop, along with these, a set of related software makes up the <a href=\"https:\/\/www.h2kinfosys.com\/blog\/learn-hadoop-online-training-classes-free-tutorials-or-ebooks\/\">Hadoop EcoSystem<\/a>. The primary purpose of this software is to enhance functionality and increase the efficiency of the Hadoop Framework.<\/p>\n\n\n\n<p>The Hadoop EcoSystem Comprises of<\/p>\n\n\n\n<ul class=\"wp-block-list\"><li>Apache PIG<\/li><li>&nbsp;Apache HBase<\/li><li>Apache Hive<\/li><li>Apache Sqoop<\/li><li>Apache Flume<\/li><li>Apache Zookeeper<\/li><li>Hadoop Distributed File System<\/li><li>MapReduce<\/li><\/ul>\n\n\n\n<h3 class=\"wp-block-heading\">&nbsp;Apache Pig<\/h3>\n\n\n\n<p>Apache Pig is a scripting language, applied to write, data analysis programs for big datasets that are present within the Hadoop Cluster. This scripting language is called Pig Latin.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Apache HBase<\/h3>\n\n\n\n<p>Apache HBase is a column-oriented database that enables real-time reading and writing of data onto the HDFS. Apache Hive is a language like SQL, which allows querying of data from HDFS. The SQL version of Hive is called HiveQL.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Apache Sqoop<\/h3>\n\n\n\n<p>Sqoop word is the combination of SQL and Hadoop. Apache Sqoop is an application, used to transfer the data to or from Hadoop to any Relational Database Management System.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Apache Flume<\/h3>\n\n\n\n<p>Apache Flume is an application that allows moving streaming data into a Hadoop Cluster. An excellent example of streaming data would be, the data that is being written to the log files.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Apache Zookeeper<\/h3>\n\n\n\n<p>And finally, the Apache Zookeeper takes care of all the coordination required among this software to function correctly.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Hadoop MapReduce<\/h3>\n\n\n\n<p>MapReduce is a computational model and framework for creating applications that can run on Hadoop. These MapReduce applications can process gigantic data in parallel on large clusters.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">HDFS<\/h3>\n\n\n\n<p>HDFS stands for the <a href=\"https:\/\/www.h2kinfosys.com\/blog\/hadoop-online-training-top-10-hadoop-tools-for-big-data\/\">Hadoop Distributed File system<\/a>. HDFS controls the storage side of Hadoop applications. MapReduce applications use data from HDFS. HDFS creates many copies of data and shares them on different nodes in a cluster. This distribution allows safe and remarkably rapid computations.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Hadoop Architecture<\/h2>\n\n\n\n<figure class=\"wp-block-image size-large\"><img decoding=\"async\" width=\"510\" height=\"364\" src=\"https:\/\/www.h2kinfosys.com\/blog\/wp-content\/uploads\/2020\/08\/image-2.png\" alt=\"Hadoop Architecture\" class=\"wp-image-4377\" title=\"\" srcset=\"https:\/\/www.h2kinfosys.com\/blog\/wp-content\/uploads\/2020\/08\/image-2.png 510w, https:\/\/www.h2kinfosys.com\/blog\/wp-content\/uploads\/2020\/08\/image-2-300x214.png 300w\" sizes=\"(max-width: 510px) 100vw, 510px\" \/><\/figure>\n\n\n\n<p>Hadoop follows Master-Slave Architecture for distributed data processing and data storage. A Hadoop cluster is made up of an individual master and various slave nodes.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">NameNode:<\/h3>\n\n\n\n<p>NameNode represents all files and directories that are managed in the namespace. Namenode maintains the file system by performing an operation like the renaming, opening, and closing the files.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">DataNode:<\/h3>\n\n\n\n<p>DataNode assists in managing the state of an HDFS node and lets you interact&nbsp; with the blocks. The HDFS cluster contains multiple DataNodes. DataNode&#8217;s responsibility is to read and write calls from the file system&#8217;s clients.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">MasterNode:<\/h3>\n\n\n\n<p>The master node enables you to execute the parallel processing of data using Hadoop MapReduce. The master node comprises of Task Tracker, Job Tracker, DataNode, and NameNode.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Slave node:<\/h3>\n\n\n\n<p>Complex calculations are carried out using some additional machines in the Hadoop cluster. These individual machines are called slave nodes. Slave node consists of Task Tracker and a DataNode enabling you to carry out synchronization of the processes with the NameNode and Job Tracker.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>In this article, you will learn an introduction to Hadoop and how this open-source framework utilized to create data processing using a simple programming model What is Hadoop? Apache Hadoop is an open-source framework utilized to create data processing applications using a simple programming model which are executed in a distributed computing environment. It is [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":4397,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"_acf_changed":false,"footnotes":"","_members_access_role":[],"_members_access_error":""},"categories":[138],"tags":[543,1226,1225,1224],"class_list":["post-4368","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bigdata-hadoop-tutorials","tag-architecture","tag-ecosystem-and-components","tag-introduction-to-hadoop","tag-what-is-hadoop"],"acf":[],"_links":{"self":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts\/4368","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/comments?post=4368"}],"version-history":[{"count":0,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts\/4368\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/media\/4397"}],"wp:attachment":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/media?parent=4368"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/categories?post=4368"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/tags?post=4368"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}