March 21, 2024

HashMap Internal Implementation in Java

HashMap internal implementation in Java or how does HashMap work internally in Java is a very important interview question. Some of the important points that you should know are-

  1. Where does HashMap store its elements internally?
  2. What is the term "bucket" in HashMap?
  3. What is hashing concept and how does it relate to HashMap?
  4. How does put() method work in HashMap?
  5. What happens if same hash is calculated for the keys or how are elements stored in case of hash collision?
  6. What happens if the null key is added.
  7. How does get() method work in HashMap?
  8. How does remove() method work in HashMap?

In this post we’ll go through the HashMap internal implementation in Java and try to explain the above mentioned points. Note that all the code snippet of the HashMap class provided in this post are from JDK 17.

Where does HashMap store its elements

Internally HashMap class in Java uses an Array (named table) of type Node to store its elements. Where Node<K, V> is an inner class with in HashMap class. Array is defined as follows in the HashMap class.

transient Node<K,V>[] table;
In a HashMap elements are stored as (key, value) pair and this (key, value) pair is represented by an interface Map.Entry. The Node class is an implementation of Map.Entry interface.

What is the term bucket in HashMap

When a (key, value) pair is added to a HashMap, using that key a hash is calculated which gives the index in the array where that (key, value) pair will be added.

The term bucket used here is actually each index of the array. By default HashMap array is of length 16 so there are 16 buckets in a HashMap. Since the array is named table so table[0] is bucket0, table[1] is bucket1 and so on till bucket15.

When an element is added to the HashMap it is not added to that index in the array directly. Every bucket of the HashMap has an associated linked list and each array index holds reference to that linked list. Once the bucket to which the element has to be added is decided based on the calculated hash, a new node is created in the linked list which will have the (key, value) pair.

Following image shows how buckets and stored elements in the linked list will look like in the internal implementation of HashMap.

HashMap internal implementation in Java

hashCode() and equals() method

For calculating hash, hashCode() method is called. equals() method is used to compare objects for equality.

Both of these methods are defined in the Object class in Java so available to all the classes, as Object class is super class for all the classes in Java. If you are using any custom object as key, ensure that these two methods hashCode() and equals() are implemented.

In HashMap hash is calculated using the key so it is very important that hashCode() is properly implemented for the fair distribution of the keys among all the buckets and there are less hash collisions. For example suppose you are using a custom object as key and the hashCode() implementation is not good. If you add 50 elements to the HashMap and same hash is calculated for 30 of them then the linked list associated with that bucket will have 30 elements where as other buckets will be relatively empty affecting the overall performance of HashMap.

equals() method implementation is used to verify if the inserted key is equal to any of the already inserted keys. Thus it is important to implement equals() method properly to ensure that an object is uniquely identified.

How does put() method work in HashMap

With all the groundwork done till now by going through buckets, hashing and hashCode() and equals() method it will be easy for you now to understand the internal implementation of HashMap in Java.

When you add a new (key,value) pair using put() method, first of all using key a hash will be calculated which will determine the bucket the (key, value) pair will go to.

If that bucket is empty a new linked list will be created, where first node of the linked list will be your (key, value) pair and the bucket (that array index) will hold the reference to that linked list.

If the bucket is not empty, that means linked list is already there. In that case equals() method is used to verify if such a key already exists in that bucket, if not found then a new node is created in the already existing linked list. In case equals() method returns true, that means key already exists in the bucket. In that case, the new value for the matching key will overwrite the old value.

In HashMap class implementation put() method is written as follows-

public V put(K key, V value) {
  return putVal(hash(key), key, value, false, true);
}

As you can see first thing it is doing is to calculate hash by passing key.

This explanation of put() method also covers the scenario when same hash is calculated for more than one key (Hash Collision scenario).

What happens when the null key is added

In HashMap adding one null key is permitted. When a (key, value) pair is added where key is null, hash calculation is not done and that (key, value) pair is always added to bucket 0.

You can see it from the internal implementation of the hash() method.

static final int hash(Object key) {
  int h;
  return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}

How does get() method work in HashMap

In Java HashMap get() method is called with key as argument. Using that key, hash is calculated to determine the bucket where that element is stored. If the linked list associated with that bucket has more than one node then iteration of the linked list is done to match the stored keys with the passed key using equals method. When the matching key is found the associated value is returned.

In HashMap class implementation get() method is implemented as follows-

public V get(Object key) {
  Node<K,V> e;
  return (e = getNode(hash(key), key)) == null ? null : e.value;
}

How does remove() method work in HashMap

Implementation of remove() method is similar to get() method. Using the passed key, hash is calculated to determine the bucket where that element is stored. If the linked list associated with that bucket has more than one node then iteration of the linked list is done to match the stored keys with the passed key using equals method. When the matching key is found that node of the linked list is dereferenced.

HashMap changes in Java 8

HashMap implementation is designed to provide constant-time performance for the basic operations (get and put). But the performance of the HashMap may degrade if hashCode() is not properly implemented and there are lots of hash collisions.

As we have already seen in case of hash collisions one of the bucket will have more load and more (key, value) pairs will be added to the linked list associated with that bucket. For searching (get() method) in a linked list linear iteration of the linked list is done which will mean a worst case performance of O(n) if the searched key is the last node of the linked list.

To counter that problem of a particular linked list having more elements HashMap implementation is changed in Java 8. After a certain threshold is reached linked list is replaced by balanced tree to store elements. This change ensures performance of O(log(n)) in worst case scenarios rather than O(n) in the case of linked list.

That's all for the topic HashMap Internal Implementation in Java. If something is missing or you have something to share about the topic please write a comment.


You may also like

No comments:

Post a Comment