Introduction to Apache Pig
Just like MapReduce, Apache Pig is used to analyze big data sets. It is designed to deliver an abstraction over MapReduce, decreasing the complexity of writing a MapReduce program as a MapReduce program that requires Python or Java Knowledge. Apache Pig helps in performing data manipulation operations very quickly in Hadoop.
Pig consists of two components:
- JVM for running PigLatin.
- Pig Latin, which is a programming language
A Pig Latin program comprises a sequence of procedures or modifications applied to the input data to create output. These operations describe a data flow which is translated into an executable representation, by Pig execution environment.
These transformations provide a level of abstraction that hides a series of MapReduce jobs. This abstraction allows the programmer to focus on data instead of lengthy codes.
PigLatin is a moderately strengthened language that uses friendly keywords from data processing, e.g., Join, Group, and Filter.
Pig has two execution modes:
- Local mode: In local mode, Pig runs on JVM and uses the localhost. This mode is appropriate only for the testing on small datasets using Pig.
- Map Reduce mode: In MapReduce mode, queries written in Pig Latin programming language are rephrased into MapReduce jobs and run on a Hadoop cluster. For running Pig for large datasets, MapReduce mode is used.
How to Download and Install Pig
Download the pig from the link given below
Now move the downloaded file to the supper_user
Now extract the content in the folder using the command given below.
|sudo tar -xvf pig-0.16.0.tar.gz pig-0.16.0/|
Open the bashrc file using the command below.
And do the following modifications.
Now run the following command.
Now we need to compile PIG. Run the following commands.
|sudo apt-get install ant|
Recompile the PIG
|sudo ant clean jar-all -Dhadoopversion=23|
Check if the PIG is installed using the following command