In Hadoop version 1.0, introduced as MRV1(MapReduce Version 1), MapReduce did both processing and resource control functions. It consisted of a Job Tracker, that was the only master. The Job Tracker designated the resources performed, scheduling, and watched the processing jobs. It specified map and reduce tasks on several smaller processes called the Task Trackers. The Task Trackers regularly reported their development to the Job Tracker.
This design followed in scalability bottleneck due to only one Job Tracker. In its article, IBM stated that according to Yahoo!, the reasonable limits of such a design are spread with a cluster of 5000 nodes and 40,000 tasks working concurrently. Aside from this condition, the utilization of computational devices is ineffective in MRV1. Also, the Hadoop framework grew restricted only to the MapReduce processing model.
To win all these problems, YARN was added in Hadoop version 2.0 in 2012 by Yahoo and Hortonworks. YARN’s central idea is to relieve MapReduce by taking over Resource Management and Job Scheduling responsibility. YARN began to give Hadoop the capacity to run non-MapReduce jobs inside the Hadoop framework.
Introduction to Hadoop YARN
Now that I have told you about YARN’s need, let me present you with the core component of Hadoop v2.0, YARN. YARN provides various data processing methods like graph processing, interactive processing, stream processing, and batch processing to manage and prepare data stored in HDFS. Hence YARN opens up Hadoop to other types of distributed applications behind MapReduce.
Aside from Resource Management, YARN also provides Job Scheduling. YARN makes all your processing activities by allotting resources and scheduling jobs. Apache Hadoop YARN Architecture consists of the subsequent central parts :
- Resource Manager: Works on a master daemon and controls the resource allocation in the group.
- Node Manager: They work on the slave daemons and are accountable for performing a task on each single Data Node.
- Application Master: Handles the user job lifecycle and support needs of unique applications. It works with the Node Manager and watches the accomplishment of tasks.
- Container: Use resources, including RAM, CPU, Network, HDD, etc., on a particular node.
Components of YARN
The workflow in Hadoop YARN
Point to the given image and see the subsequent steps required in the Application workflow of Apache Hadoop YARN:
- The client capitulates an application
- Resource Manager allows a container to begin Application Manager
- Application Manager records with Resource Manager
- Application Manager demands containers from Resource Manager
- Application Manager tells the Node Manager to start containers
- Application code is done in the container
- Client contacts Resource Manager/Application Manager to monitor the application’s status
- Application Manager unregisters with Resource Manager