Map Reduce Shuffle Sort Phase (Part 2)

This post is in the continuation with the Map Reduce Shuffle Sot Phase (Part 1),so it is clear that the mapper passes (K,V) key/value pair to reducer since sorting,grouping and
partitioning took place in Sort Shuffle Phase the basic requirement is
to have a way to compare the keys in order to sort,partition and group
them for this purpose the key class must implement
WritableComparable interface and value class should
be of a Writable type.

The method that is being called is

public int compareTo(WritableComparable o){}

Based on the result of this method sorting is done by default the sorting is done on compareTo method of the key class

Other than sorting, shuffle-sort phase is also used to distribute the mapper values amongst different reducers. This is done by partitioning the key values. Partitioner decides which {K, V} pair should be assigned to which reducer for processing. It is implemented by using a subclass of the Partitioner class. This class has a method:

public int getPartitions(K key, V value,int numReduceTasks) {}

While partitioning decides the reduce task, grouping decides which {K, V} pairs should be clubbed up in one reduce call. Each reduce task’s data is divided into various groups and reduce () method is called for each group (multiple reduce method calls may be made within one reduce task. Number of groups = number of reduce calls)

To make surethat the same keys are grouped in one group,The group comparator class should extend from WritableComparator class and override its compareTo() method. While grouping, {K, V} pairs are considered to be in the same group till the compareTo() call returns a 0. A new group is formed once a non-zero value is returned from the group comparator. This also implies that the keys should be sorted prior to grouping (which is already taken care in the sort phase); else grouping may emit incorrect results. By default, the key class’ compareTo() that was used for sorting is also used for group comparison.