Scheduling and types of scheduler in YARN
In an ideal world, the requests that a YARN application makes would be granted immediately. In the real world, however, resources are limited, and on a busy cluster, an application will often need to wait to have some of its requests fulfilled. It is the job of the YARN scheduler to allocate resources to applications according to some defined policy. Scheduling in YARN enables multiple applications running concurrently to efficiently and effectively share the cluster’s distributed compute and memory resources.
Scheduling in general is a difficult problem and there is no best policy, which is why YARN provides a choice of schedulers and configurable policies.
Three types of schedulers are available in YARN:
1) FIFO (first in first out)
2) Capacity Schedulers and
3) Fair Schedulers
FIFO scheduler i.e. first in first out scheduler.
The FIFO Scheduler places applications in a queue and runs them in the order of submission (first in, first out). Requests for the first application in the queue are allocated first; once the requests have been satisfied, the next application in the queue is served, and so on.
The FIFO Scheduler has the merit of being simple to understand and not needing any configuration, but it’s not suitable for shared clusters.
because Large applications will tend to use all the resources in a cluster, so each application has to wait its turn. The figure shows that the small job is blocked until the large job completes.
Capacity scheduler: The Capacity Scheduler allows sharing of a Hadoop cluster along organizational lines, whereby each organization is allocated a certain capacity of the overall cluster. It is set up with a dedicated queue that is configured to use a given fraction of the cluster capacity. Separate dedicated queue allows the small job to start as soon as it is submitted.
Queues can be further divided in hierarchical fashion, allowing each organization to share its cluster allowance between different groups of users within the organization. As we saw in Figure, a single job does not use more resources than its queue’s capacity. However, if there is more than one job in the queue and there are idle resources available, then the Capacity Scheduler may allocate the spare resources to jobs in the queue, even if that causes the queue’s capacity to be exceeded. This behavior is known as queue elasticity.
Within a queue, applications are scheduled using FIFO scheduling by default. However they can be prioritize.Comprehensive set of limits are provided to prevent a single application or user from monopolizing resources of the queue or the cluster. This feature is known as multi-tenancy.Goal of the Capacity Scheduler is to ensure fairness and stability of the cluster. The CapacityScheduler is a much simpler scheduler than the FairScheduler, as you simply define your queues, including a default queue, and assign a percentage of the available cluster resources to the queue. You can also place hard limits on vcore and memory allocated to a queue. The CapacityScheduler is the default scheduler in YARN. The default Capacity Scheduler configuration file, capacity-scheduler.xml, is located in the Hadoop configuration directory.If you make changes to the capacity-scheduler.xml configuration file on a running cluster, you will need to execute the yarn command to refresh the scheduler information on the Resource Manager.
With the Fair Scheduler, there is no need to reserve a set amount of capacity, since it will dynamically balance resources between all running jobs. As shown in the figure, Just after the first (large) job starts, it is the only job running, so it gets all the resources in the cluster. When the second (small) job starts, it is allocated half of the cluster resources so that each job is using its fair share of resources.
Note that there is a lag between the time the second job starts and when it receives its fair share, since it has to wait for resources to free up, as containers used by the first job complete. After the small job completes and no longer requires resources, the large job goes back to using the full cluster capacity again. The overall effect is both high cluster utilization and timely small job completion.Fair sharing can also work with job priorities – the priorities are used as weights to determine the fraction of total compute time that each job gets.The fair scheduler organizes jobs into pools, and divides resources fairly between these pools.By default, there is a separate pool for each user, so that each user gets an equal share of the cluster. It is also possible to set a job’s pool based on the user’s Unix group or any jobconf property. Within each pool, jobs can be scheduled using either fair sharing or first-in-first-out (FIFO) scheduling.
In addition to providing fair sharing, the Fair Scheduler allows assigning guaranteed minimum shares to pools, which is useful for ensuring that certain users or production applications always get sufficient resources. When a pool contains jobs, it gets at least its minimum share, but when the pool does not need its full guaranteed share, the excess is split between other pools.
If a pool’s minimum share is not met for some period of time, the scheduler optionally supports preemption of jobs in other pools. The pool will be allowed to kill tasks from other pools to make room to run. Preemption can be used to guarantee that “production” jobs are not starved while also allowing the Hadoop cluster to also be used for experimental and research jobs. When choosing tasks to kill, the fair scheduler picks the most-recently-launched tasks from over-allocated jobs, to minimize wasted computation. Preemption does not cause the preempted jobs to fail, because Hadoop jobs tolerate losing tasks; it only makes them take longer to finish.
The Fair Scheduler can limit the number of concurrent running jobs per user. This can be useful when a user must submit hundreds of jobs at once, or for ensuring that intermediate data does not fill up disk space on a cluster when too many concurrent jobs are running. Finally, the Fair Scheduler can limit the number of concurrent running tasks per pool. This can be useful when jobs have a dependency on an external service like a database or web service that could be overloaded if too many map or reduce tasks are run at once. Setting job limits causes jobs submitted beyond the limit to wait until some of the user/pool’s earlier jobs finish.
The Fair Scheduler contains configuration in two places — algorithm parameters are set in HADOOP_CONF_DIR/mapred-site.xml,
while a separate XML file called the allocation file, located by default in HADOOP_CONF_DIR/fair-scheduler.xml, is used to configure pools, minimum shares, running job limits and preemption timeouts. The allocation file is reloaded periodically at runtime, allowing you to change pool settings without restarting your Hadoop cluster.