Friday, June 28, 2013

Apache Hadoop YARN : Next Generation MapReduce

MapReduce has a complete transformation in hadoop-0.x and now we have MapReduce v2 or YARN

Main inspiration behind development of MapReduce v2 that is YARN is to divide major functionality of JobTracker that resource management and job scheduling/monitoring into a  separate daemons. MapReduce v2 have a global resource management(RM) and Application Master per application(single client job or job workflows)

The Resource-Manager(RM) has authority to control over the Node-Manager(NM), the per-node slave and co-ordinates resources among all the applications in the system. The Application-Master(AM) is the framework, has a responsibility coordinating with Resource-Manager for resources negotiation and Node-Manager to execute and monitor the tasks.

As MapReduce v2 has two core responsibilities i.e.  resource management and job scheduling/monitoring so Resource-Manager(RM) have two core components, Scheduler and Applications-Manager

Scheduler is responsible for allocating execution time slots and resources to the various running applications as per the requirements/configurations, the Scheduler is pure Scheduler, it does not perform monitoring or status tracking of the application. The Scheduler performs its scheduling function as per the resource requirements of the applications; it does it through resource Container which examines elements such as memory, cpu, disk, network etc. 

Application-Manager(AM) is responsible for accepting the jobs, negotiating with Container for executing the application specific Application-Master and restarting the Application-Master Container on application failure or hardware failure. The Node-Manager is the per slave machine agent who is responsible for Containers, monitoring their resource usage and reporting the same to the Resource-Manager. The per-application Application-Master has the responsibility of negotiating appropriate resource Containers from the Scheduler, tracking their status and monitoring for progress.

MapReduce v2 jobs are compatible with all previous stable releases means all previous jobs will run on MapReduce v2 just need to recompile.