Bigdata Hadoop Tutorials

Hadoop Pig Tutorial: What is, Architecture, Example

Introduction to Apache Pig

Just like MapReduce, Apache Pig is used to analyze big data sets. It is designed to deliver an abstraction over MapReduce, decreasing the complexity of writing a MapReduce program as a MapReduce program that requires Python or Java Knowledge. Apache Pig helps in performing data manipulation operations very quickly in Hadoop.

Pig Architecture

Pig consists of two components:

  1. JVM for running PigLatin.
  2. Pig Latin, which is a programming language

A Pig Latin program comprises a sequence of procedures or modifications applied to the input data to create output. These operations describe a data flow which is translated into an executable representation, by Pig execution environment.

These transformations provide a level of abstraction that hides a series of MapReduce jobs. This abstraction allows the programmer to focus on data instead of lengthy codes.  

PigLatin is a moderately strengthened language that uses friendly keywords from data processing, e.g., Join, Group, and Filter.

Pig has two execution modes:

  1. Local mode: In local mode, Pig runs on JVM and uses the localhost. This mode is appropriate only for the testing on small datasets using Pig.
  2. Map Reduce mode: In MapReduce mode, queries written in Pig Latin programming language are rephrased into MapReduce jobs and run on a Hadoop cluster. For running Pig for large datasets, MapReduce mode is used.

How to Download and Install Pig

Download the pig from the link given below

https://downloads.apache.org/pig/pig-0.16.0/

Now move the downloaded file to the supper_user

Now extract the content in the folder using the command given below.

sudo tar -xvf pig-0.16.0.tar.gz pig-0.16.0/

Open the bashrc file using the command below.

~/.bashrc

And do the following modifications.

Now run the following command.

. ~/.bashrc

Now we need to compile PIG. Run the following commands.

cd $PIG_HOME

Install ANT.

sudo apt-get install ant

Recompile the PIG

sudo ant clean jar-all -Dhadoopversion=23

Check if the PIG is installed using the following command

pig -help

Facebook Comments
Tags

Related Articles

Leave a Reply

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Back to top button
Close
Close