March 8, 2022

HashSet Internal Implementation in Java

HashSet internal implementation in Java or how does HashSet work internally in Java is a very important interview question. Some of the important points that you should know are-

  1. What is the backing data structure for HashSet or where does HashSet stores its element?
  2. How does add() method work in HashSet?
  3. How does remove() method work in HashSet?
  4. How elements are retrieved from HashSet?

In this post we’ll go through the internal implementation of HashSet in Java and try to explain the above mentioned points. Note that all the code snippet of the HashSet class provided in this post are from JDK 10.

Since HashSet internally uses HashMap for its operations so knowing HashMap Internal Implementation in Java will help a lot in understanding internal implementation of HashSet.

Where does HashSet store its element

Internally HashSet in Java uses HashMap to store its elements. With in HashSet class a HashMap is defined that is used to store its elements.

private transient HashMap<E,Object> map;

If you see all the defined constructors for HashSet, all of those constructors create a HashMap.

public HashSet() {
  map = new HashMap<>();
}

public HashSet(Collection<? extends E> c) {
  map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
  addAll(c);
}

public HashSet(int initialCapacity, float loadFactor) {
  map = new HashMap<>(initialCapacity, loadFactor);
}

public HashSet(int initialCapacity) {
  map = new HashMap<>(initialCapacity);
}

Initial capacity, load factor and buckets in HashSet

You should have clear understanding of the terms initial capacity, load factor and buckets to understand internal implementation of HashSet better.

As already mentioned HashSet uses HashMap to store its elements and HashMap in turn internally uses an array of type Node to store elements where Node<K, V> is an inner class with in HashMap class.

  • Capacity- If you don’t specify any capacity while creating HashSet then the array will have default initial capacity of 16. If you use the constructor where initial capacity is also passed then the array will have the specified initial capacity.
  • Bucket- In HashMap concept of bucket is used for storing elements where each index of array is conceptualized as one bucket. So, total there are 16 (in default case) buckets. For every (value) that is added to HashSet a hash is calculated using the key, based on that hash value one of these buckets is chosen to store the element.
  • Load factor- Load factor is the threshold for the HashSet storage. Once the threshold is reached the capacity of the HashSet is doubled. Default load factor is 0.75 which means if the 75% of the capacity is reached the HashSet is resized.

How does add method work in Java HashSet

You must be thinking if internally HashSet uses HashMap for adding elements then how come add(E e) method in HashSet takes only value as argument not a (key, value) pair. After all HashMap stores its element as (key, value) pair.

In Java HashSet implementation; from the add(E e) method, put() method of HashMap is called for adding the element and a (key, value) pair is sent too from HashSet. What internally happens is that the value passed for adding to the HashSet becomes the key for HashMap and a dummy object “PRESENT” is always added as value.

Dummy object PRESENT is defined in HashSet implementation as follows-

// Dummy value to associate with an Object in the backing Map
private static final Object PRESENT = new Object();

The implementation of add(E e) method is as follows-

public boolean add(E e) {
  return map.put(e, PRESENT)==null;
}

Here you can see that value passed for storing in HashSet becomes the key in the HashMap. Actually that's how it's ensured that only unique values are stored in HashSet. In HashMap value may be duplicate but Key should be unique. As we have seen that the value becomes key in HashMap which remains unique.

How values are retrieved from HashSet

There is no method in HashSet to get an individual value. You can iterate over the HashSet and get all the values though. The iterator method of the HashSet returns the keySet of the backing HashMap. We have already seen the values added to the HashSet becomes key in the HashMap so what you actually get is the values of the HashSet.

keySet()- Returns a Set view of the keys contained in this map.

The implementation of iterator() method is as follows-

public Iterator<E> iterator() {
  return map.keySet().iterator();
}

How values are removed from HashSet

For removing the value same interchange happens. What you provide as value for removing in the remove() method of the HashSet becomes the key while making a call to backing HashMap’s remove() method.

public boolean remove(Object o) {
  return map.remove(o)==PRESENT;
}

Here note that the remove method of the HashMap returns the value associated with key. Now we know that the value is always passed as "PRESENT" while adding to HashMap, that’s why the comparison map.remove(o)==PRESENT;

Important points

  1. HashSet is backed by a HashMap instance.
  2. In the internal implementation of the HashSet a dummy object “PRESENT” is always added a value to the backing HashMap. The value passed to add to HashSet becomes key in the HashMap.
  3. When the hash is calculated for HashSet it is calculated using the value itself as value has become in the HashMap.

That's all for the topic HashSet Internal Implementation in Java. If something is missing or you have something to share about the topic please write a comment.


You may also like

No comments:

Post a Comment