[Leetcode] – Group Anagrams Solution

nikoo28

4 years ago

Question: Given an array of strings strs, you need to group all the anagrams together. You can return the answer in any order.
Input: strs = ["eat", "tea", "tan", "ate", "nat", "bat"]
Output: [["bat"], ["nat", "tan"], ["ate", "eat", "tea"]]

Let us try to understand the problem statement and its test cases first.

You are provided with an array that has some strings. You need to group all the anagrams together. It could be possible that some strings don’t find anagrams, and that is perfectly fine. It can exist as an individual group. Return this group in any order.

Let us look at a sample test case:

You must also understand, what is an Anagram? Two strings are said to be anagrams of each other if they are made up of the same characters with same frequency. For example: the word LISTEN and SILENT are anagrams. They are composed of the letters E, I, L, N, S, T.

With this definition in mind, in the below image you can see all the highlighted anagrams. So, tan and nat are said to be anagrams and they form a group. Just like this, we need to group all the anagrams together, and form a resultant output.

A brute force method to solve this problem would be very obvious.

Iterate over each string in the input array.
For each string, check with each element of the array if they are anagrams.
If it is an anagram, add it to a group.
Else, move the string to a different group.

This method would work but you will run out of time trying to execute it for large test cases.

Method 1: Group by Sorting

One way to solve this problem is to group via sorting. Think about the properties of anagrams. If two words are anagrams of each other, they contain exactly the same characters. So, if we sort both of them, we should theoretically get the exact same string.

Let us look at a sample input.

Now, as a next step, try to sort all of them and look at the sorted strings. You will find some strings that look exactly alike. That is because they were anagrams. You can now just compare these sorted strings.

Once you have the sorted strings, you can create a hashmap, that will store all the groups. The key will be the sorted string, and the value would be the list of all the strings that are anagrams.

After you iterate over all the strings, the values in the hashmap will give you the required groups. The time complexity of this approach depends on the sorting technique you use to sort the strings.

Implementation:

  public List<List<String>> groupAnagramsCategorizeBySorting(String[] strs) {

    if (strs == null || strs.length == 0)
      return new ArrayList<>();

    Map<String, List<String>> stringAnagramsMap = new HashMap<>();
    for (String s : strs) {
      char[] arr = s.toCharArray();
      Arrays.sort(arr);
      String key = String.valueOf(arr);

      if (!stringAnagramsMap.containsKey(key))
        stringAnagramsMap.put(key, new ArrayList<>());

      stringAnagramsMap.get(key).add(s);
    }

    List<List<String>> resultList = new ArrayList<>();
    for (Map.Entry<String, List<String>> stringAnagrams : stringAnagramsMap.entrySet()) {
      resultList.add(stringAnagrams.getValue());
    }
    return resultList;
  }
Code language: Java (java)

Time Complexity: $O(n * \log k)$ (k is the length of largest string)
Space Complexity: $O(n)$

Method 2: Group by Frequency

It is however possible to improve the above approach a little bit. We can also take advantage of the fact that two anagrams have the same frequency of characters as well. So, instead of sorting the strings, we can also generate a new string to exhibit this property.

Example: The word KNEE and KEEN are anagrams. You can also say that both these words have K - 1 times, E - 2 times and N - 1 time. Using this method we can create some kind of a frequency string which can be something like E2K1N1. Notice that this string would be same for both the words.

Let us take up an example once again.

We can now create frequency strings for each of these strings. They will look like:

aab = a2b1
aba = a2b1
baa = a2b1
abbccc = a1b2c3
abb = a1b2
bcbcca = a1b2c3

Looking closely at the frequency strings, you can conclude that all anagrams will generate the same frequency strings. This gives you some key to group all the anagrams. Once again, you can create a hashmap that will have the key as the frequency string, and the value would be a list of all the anagrams

After you iterate over all the strings, you should have all your groups in the values of the hashmap. The time complexity of this approach can be better as frequency strings can be generated in linear time.

Implementation:

  public List<List<String>> groupAnagramsCategorizeByFrequency(String[] strs) {

    // Check for empty inputs
    if (strs == null || strs.length == 0)
      return new ArrayList<>();

    Map<String, List<String>> frequencyStringsMap = new HashMap<>();
    for (String str : strs) {

      String frequencyString = getFrequencyString(str);

      // If the frequency string is present, add the string to the list
      if (frequencyStringsMap.containsKey(frequencyString)) {
        frequencyStringsMap.get(frequencyString).add(str);
      }
      else {
        // else create a new list
        List<String> strList = new ArrayList<>();
        strList.add(str);
        frequencyStringsMap.put(frequencyString, strList);
      }
    }

    return new ArrayList<>(frequencyStringsMap.values());
  }

  private String getFrequencyString(String str) {

    // Frequency buckets
    int[] freq = new int[26];

    // Iterate over each character
    for (char c : str.toCharArray()) {
      freq[c - 'a']++;
    }

    // Start creating the frequency string
    StringBuilder frequencyString = new StringBuilder("");
    char c = 'a';
    for (int i : freq) {
      frequencyString.append(c);
      frequencyString.append(i);
      c++;
    }

    return frequencyString.toString();
  }
Code language: Java (java)

Time Complexity: $O(n * k)$ (k is the length of the largest string)
Space Complexity: $O(n)$

As always, the complete code and its test cases can be found on Github as well.

Method 1: Group by Sorting

Implementation:

Method 2: Group by Frequency

Implementation:

Video Explanation:

Sharing is caring: