{"id":5027,"date":"2020-09-24T19:21:04","date_gmt":"2020-09-24T13:51:04","guid":{"rendered":"https:\/\/www.h2kinfosys.com\/blog\/?p=5027"},"modified":"2020-09-24T19:21:06","modified_gmt":"2020-09-24T13:51:06","slug":"hadoop-pig-tutorial-what-is-architecture-example","status":"publish","type":"post","link":"https:\/\/www.h2kinfosys.com\/blog\/hadoop-pig-tutorial-what-is-architecture-example\/","title":{"rendered":"Hadoop Pig Tutorial: What is, Architecture, Example"},"content":{"rendered":"\n<h2 class=\"wp-block-heading\">Introduction to Apache Pig<\/h2>\n\n\n\n<p>Just like MapReduce, Apache Pig is used to analyze big data sets. It is designed to deliver an abstraction over MapReduce, decreasing the complexity of writing a MapReduce program as a MapReduce program that requires Python or Java Knowledge. Apache Pig helps in performing data manipulation operations very quickly in Hadoop.<\/p>\n\n\n\n<h2 class=\"wp-block-heading\">Pig Architecture<\/h2>\n\n\n\n<p>Pig consists of two components:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li><strong>JVM for <\/strong>running PigLatin.<\/li><li><strong>Pig Latin,<\/strong> which is a programming language<\/li><\/ol>\n\n\n\n<p>A Pig Latin program comprises a sequence of procedures or modifications applied to the input data to create output. These operations describe a data flow which is translated into an executable representation, by Pig execution environment.<\/p>\n\n\n\n<p>These transformations provide a level of abstraction that hides a series of MapReduce jobs. This abstraction allows the programmer to focus on data instead of lengthy codes.&nbsp;&nbsp;<\/p>\n\n\n\n<p><a href=\"https:\/\/en.wikipedia.org\/wiki\/Pig_Latin#:~:text=Pig%20Latin%20is%20a%20language,to%20create%20such%20a%20suffix.\" rel=\"nofollow noopener\" target=\"_blank\">PigLatin <\/a>is a moderately strengthened language that uses friendly keywords from data processing, e.g., Join, Group, and Filter.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/r-nDaf_75_O4EbBx7lZKPKDMJ4_i9hVcARY7Ln3VUPC-V_uFAC5FIFRxdZmnirMttHaWgZC9zUlFzYITKYqN7KlRYd5M8XUEoOe2hBcAqOrElVuF9ydTUQ_Yu0hxNo8T0nt92aJUqi1JNRqw3g\" alt=\"\" title=\"\"><\/figure>\n\n\n\n<p>Pig has two execution modes:<\/p>\n\n\n\n<ol class=\"wp-block-list\"><li>Local mode: In local mode, Pig runs on JVM and uses the localhost. This mode is appropriate only for the testing on small datasets using Pig.<\/li><li>Map Reduce mode: In <a href=\"https:\/\/www.h2kinfosys.com\/blog\/what-is-mapreduce-how-it-works\/\">MapReduce <\/a>mode, queries written in Pig Latin programming language are rephrased into MapReduce jobs and run on a Hadoop cluster. For running Pig for large datasets, MapReduce mode is used.<\/li><\/ol>\n\n\n\n<h2 class=\"wp-block-heading\">How to Download and Install Pig<\/h2>\n\n\n\n<p>Download the pig from the link given below<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table><tbody><tr><td>https:\/\/downloads.apache.org\/pig\/pig-0.16.0\/<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/5JQoD80Z9vPn1ghddHPxDUxvj-2C9GZh0YuEthNanAAprxCw2jgM2-TMI88J7HFOU-kY0AkUCmifx3CYBSVBa01uqGVLsHvVq3tsV__IqB1Yob__VVyEsbZAR2BL1vg6Nor0ZmXXTvv3k44SNg\" alt=\"\" title=\"\"><\/figure>\n\n\n\n<p>Now move the downloaded file to the supper_user<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/mf8WiVeU9qhsQAWC8V5JPTnWhTlfRI4u8ynZop6TCB72qi3JSPlC5YCJJN8NdLe0CffFVI65jpkLeAF3AxQgJ41Nrbl1mGr1KMvXL5GUG8IqDelufhWu8sc8-L7-CMOvofOl_fiQMmHJK57vbw\" alt=\"\" title=\"\"><\/figure>\n\n\n\n<p>Now extract the content in the folder using the command given below.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-subtle-pale-pink-background-color has-background\"><tbody><tr><td>sudo tar -xvf pig-0.16.0.tar.gz pig-0.16.0\/<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Open the bashrc file using the command below.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-subtle-pale-pink-background-color has-background\"><tbody><tr><td>~\/.bashrc<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>And do the following modifications.<\/p>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh5.googleusercontent.com\/H6VVsPE7KGfwIh2vbpW0uIRH7kwdxEyr0tUQE2MRYEEx6YH_4Abb1C_fE_x4Il1meCC9PJlyNi_s97M3OVZkUS_X_6wCAhBBVSoD2ijYF8zqt0cmwyYL7h9kDchsvOShJ-reoMvh9YCAWaGzkw\" alt=\"\" title=\"\"><\/figure>\n\n\n\n<p>Now run the following command.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-subtle-pale-pink-background-color has-background\"><tbody><tr><td>. ~\/.bashrc<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh3.googleusercontent.com\/Ce8IM7-jyAIfr56-4jIOLYVIfhMR25UFP_IYEBSsWkyJCimVRJ07u1ZTqI5i5LHksxNLeIwUmrBiq-Q2DxJGS_nUfRwnOr6Op_OQsqk0e72KUGXYFNcqJoqlOTryYJsumRnftGZVhLhfLK3pXw\" alt=\"\" title=\"\"><\/figure>\n\n\n\n<p>Now we need to compile PIG. Run the following commands.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-subtle-pale-pink-background-color has-background\"><tbody><tr><td>cd $PIG_HOME<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh4.googleusercontent.com\/SzXGA6wMV24e43NAkEI3hqhPsNxL02pXXFNX7Iupz1aLkWI7HDHPy3oAGrkzfZkF4NUI88WhVLFqi2yj7nhN326JaMNhs7pvqOOH-EiFfBdYsQVRl6pLwvRPh0-LC1cROcgR50yw7BgrKQdNBg\" alt=\"\" title=\"\"><\/figure>\n\n\n\n<p>Install ANT.<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-subtle-pale-pink-background-color has-background\"><tbody><tr><td>sudo apt-get install ant<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<figure class=\"wp-block-image\"><img decoding=\"async\" src=\"https:\/\/lh6.googleusercontent.com\/qqecjVaftT15C9UphZBkIn2WptumRcdOmzk6g7ArM4VHN2HH89ihIbGCw6ygwCilGbS3C10PfzYFqOn5tSt-gS_Na39nmCBj0BXZHiZ6KSlFJ35Ak5qNcCW_VL-Ccz6rle1mFMLW1dVmDWRDhQ\" alt=\"\" title=\"\"><\/figure>\n\n\n\n<p>Recompile the PIG<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-subtle-pale-pink-background-color has-background\"><tbody><tr><td>sudo ant clean jar-all -Dhadoopversion=23<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p>Check if the PIG is installed using the following command<\/p>\n\n\n\n<figure class=\"wp-block-table\"><table class=\"has-subtle-pale-pink-background-color has-background\"><tbody><tr><td>pig -help<\/td><\/tr><\/tbody><\/table><\/figure>\n\n\n\n<p><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Introduction to Apache Pig Just like MapReduce, Apache Pig is used to analyze big data sets. It is designed to deliver an abstraction over MapReduce, decreasing the complexity of writing a MapReduce program as a MapReduce program that requires Python or Java Knowledge. Apache Pig helps in performing data manipulation operations very quickly in Hadoop. [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":5065,"comment_status":"open","ping_status":"open","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[138],"tags":[1409,1408],"class_list":["post-5027","post","type-post","status-publish","format-standard","has-post-thumbnail","hentry","category-bigdata-hadoop-tutorials","tag-apache-pig","tag-hadoop-pig-tutorial"],"_links":{"self":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts\/5027","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/comments?post=5027"}],"version-history":[{"count":0,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/posts\/5027\/revisions"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/media\/5065"}],"wp:attachment":[{"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/media?parent=5027"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/categories?post=5027"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/www.h2kinfosys.com\/blog\/wp-json\/wp\/v2\/tags?post=5027"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}