Speculative Execution in Hadoop Framework

Speculative execution in Hadoop framework is an optimization technique to ensure that the submitted job finishes in a time-bound manner.

Need for speculative execution in Hadoop

When a MapReduce job is submitted there will be several map tasks running in parallel working on the portion of the data (input splits). Same way several reduce tasks will be spawned to work in parallel to produce the final output.

These map and reduce tasks are started on different nodes across the cluster. You may have a scenario that a few map or reduce tasks run slower than the others in a cluster. It may happen because of some hardware or network problem in the node where these tasks are running.

These slower tasks may effect the overall job execution as reduce tasks can start only when all the map tasks finish, so a slow map task may be a bottleneck here. Same way a slower reduce task may increase the overall time to finish the job. To mitigate against these bottlenecks Hadoop framework uses speculative execution.

How speculative execution in Hadoop works

After starting the map tasks and reduce tasks respectively and monitoring their progress for some time Hadoop framework knows which map or reduce tasks are taking more time than the usual. For those slow running tasks Hadoop starts the same task on another node. Here Hadoop framework is speculating that the same task operating on the same data started on another node will finish faster thus the name speculative execution of the task.

Here note that the original task and speculative task both will run and the output of whichever finishes first is used and the another one is killed. If the original task finishes before the speculative task then the speculative task is killed and vice versa.

For example if Hadoop framework detects that a map task for a given job is executing slower than the other map tasks for the same job, another instance of the same map task operating on the same data will be started on another node. Whichever map tasks finishes first, output of that will be used and other is killed.

Configuration for Speculative execution

In Hadoop speculative execution is set to true by default for both map and reduce tasks. Properties for it are set in mapred-site.xml.

mapreduce.map.speculative- If true, then multiple instances of some map tasks may be executed in parallel. Default is true.
mapreduce.reduce.speculative- If true, then multiple instances of some reduce tasks may be executed in parallel. Default is true.
mapreduce.job.speculative.speculative-cap-running-tasks- The max percent of running tasks that can be speculatively re-executed at any time. Default value is 0.1.

The class used for speculative execution calculations by Hadoop framework is yarn.app.mapreduce.am.job.speculator.class. The speculator class is instantiated in MRAppMaster.

Speculative execution in Hadoop - Drawbacks

Though idea of speculative execution of task is to reduce the execution time of the task but that involves running duplicate tasks. This duplicate execution of the tasks increases the load on the cluster. In case of a very busy cluster or a cluster with limited resources administrator may consider turning off the speculative execution.

This problem of running duplicate tasks is more pronounced in the case of reduce tasks. A reduce task gets its input from more than one map tasks running on different nodes so there is data transfer in case of reduce tasks. Running the same reduce task as part of speculative execution means same data transfer happens more than once thus increasing network load.

That's all for the topic Speculative Execution in Hadoop Framework. If something is missing or you have something to share about the topic please write a comment.

You may also like

KnpCode

December 20, 2023