June 26, 2022

YARN Fair Scheduler With Example

This post talks about Fair Scheduler in Hadoop which is a pluggable scheduler provided in Hadoop framework. FairScheduler allows YARN applications to share resources in large clusters fairly.

Overview of Fair Scheduler in YARN

Fair scheduling is a method of assigning resources to applications so that all applications running on a cluster get, on average, an equal share of resources over time.

Since resources are shared among all the running application in fair scheduler this lets short apps finish in reasonable time while not starving long-lived apps. It is also a reasonable way to share a cluster between a number of users.

Two things to note about Fair Scheduler in YARN are-

  1. By default, the Fair Scheduler bases scheduling fairness decisions only on memory. It can be configured to schedule with both memory and CPU.
  2. The scheduler organizes apps further into “queues”, and shares resources fairly between these queues.

As example– If there are two queues sales and finance. A job is submitted to sales queue, being a sole running job it will get all the resources. Now a job is submitted to finance queue which will result in the new job gradually getting half of the resources. So jobs in both of the queues will have 50% of the resources each. Now another job is submitted to finance queue that will result in half of the resources allocated to finance queue allocated to this new job. So two jobs in the finance queue will now share resources allocated to finance queue (50% of the total resources) in equal proportions where as job in sales queue will use whole 50% of the resources allocated to sales queue.

Hierarchical queues support

The fair scheduler in YARN supports hierarchical queues which means an organization can create sub-queues with in its dedicated queue.

All queues descend from a queue named "root". Available resources are distributed among the children of the root queue in the typical fair scheduling fashion. Then, the children distribute the resources assigned to them to their children in the same fashion.

Configuration for Fair Scheduler

To use the Fair Scheduler in YARN first assign the appropriate scheduler class in yarn-site.xml:


Setting up queues

Properties for setting up queues are as follows. These changes are done in the configuration file etc/hadoop/fair-scheduler.xml.

<queue> element- Represent queues. Some of the important properties of queue element are as following.

  • minResources: minimum resources the queue is entitled to, in the form “X mb, Y vcores”. If a queue’s minimum share is not satisfied, it will be offered available resources before any other queue under the same parent.
  • maxResources: maximum resources a queue is allocated, expressed either in absolute values (X mb, Y vcores) or as a percentage of the cluster resources (X% memory, Y% CPU).
  • weight: to share the cluster non-proportionally with other queues. Weights default to 1, and a queue with weight 2 should receive approximately twice as many resources as a queue with the default weight.
  • schedulingPolicy: to set the scheduling policy of any queue. The allowed values are “fifo”, “fair”, “drf” or any class that extends org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.SchedulingPolicy. Defaults to “fair”.

<defaultQueueSchedulingPolicy> element- Which sets the default scheduling policy for queues; overridden by the schedulingPolicy element in each queue if specified. Defaults to "fair".

<queueMaxAppsDefault> element- Which sets the default running app limit for queues; overridden by maxRunningApps element in each queue.

<queuePlacementPolicy> element- This element contains a list of rule elements that tell the scheduler how to place incoming apps into queues. Rules are applied in the order that they are listed. All rules accept the "create" argument, which indicates whether the rule can create a new queue. "Create" defaults to true; if set to false and the rule would place the app in a queue that is not configured in the allocations file, we continue on to the next rule.

Valid rules are as follows:

  • specified: The application is placed into the queue it requested.
  • user: The application is placed into a queue with the name of the user who submitted it.
  • primaryGroup: The application is placed into a queue with the name of the primary group of the user who submitted it.
  • secondaryGroupExistingQueue: The application is placed into a queue with a name that matches a secondary group of the user who submitted it.
  • nestedUserQueue: The application is placed into a queue with the name of the user under the queue suggested by the nested rule.
  • default: The application is placed into the queue specified in the ‘queue’ attribute of the default rule. If ‘queue’ attribute is not specified, the app is placed into ‘root.default’ queue.
  • reject: The application is rejected.

Queue configuration example

If there are two top level child queues sales and finance (Descending from root). With in sales queues there are two sub-queues apac and emea then the queues can be set up to use fair scheduler as given below-

  <queue name="sales">
    <minResources>10000 mb,0vcores</minResources>
    <maxResources>50000 mb,0vcores</maxResources>
    <queue name="emea" />
    <queue name="apac" />
  <queue name="finance">
    <minResources>10000 mb,0vcores</minResources>
    <maxResources>70000 mb,0vcores</maxResources>
    <rule name="specified"   />
    <rule name="primaryGroup" create="false" />
    <rule name="default" queue="finance" />

That's all for the topic YARN Fair Scheduler With Example. If something is missing or you have something to share about the topic please write a comment.

You may also like

No comments:

Post a Comment