Question: Given a two strings, find the minimum number of characters that must be deleted to make them anagrams.
Input: str1 = “a b c”, str2 = “a m n o p”
Output: 6
Let us try to understand this problem statement and its test case first.
We have two strings of english alphabet which may or may not have the same length. Anagrams are two words with same characters and frequencies. It is not necessary that the order of the characters is same. We have to determine, how many total number of characters must be deleted from these 2
strings, such that they become anagrams.
Let us look at test cases:
If you look at the word rat
and the word tars
, notice that if we delete the last character s
from str2
, it would become tar
. This word has the exact same characters as the word rat
. Hence, we can say that by deleting a total of 1
character(s), the words can be made anagrams.
Similarly, in the second test case, if we delete de
from str1 and ab
from str2, we delete a total of 4
characters. The image below gives a better idea.
Method 1:
You need to take advantage of the fact, that two words would be anagrams iff they have the exact same characters and frequencies. A brute force method to solve this problem is quite straight forward.
- Start with the first character in
str1
. Iterate through the second string and search if the character is present. - If we find this character, mark it as visited. Otherwise, this character becomes a candidate for deletion. Increase the delete count of
str1
by1
. - Do this process for each character of
str1
. - Similarly, now start with each character of
str2
. Try to find the character instr1
. - If found, mark it as visited, else increase the delete count of
str2
by1
. - Now count the number of characters we need to delete from both the strings.
Method 2 (Efficient Solution):
The above method works perfectly, but you end up wasting a lot of time iterating over both the strings multiple number of times. We need an optimized approach.
One way to approach the problem is by creating a bucket of each character. Counting sort also works on a similar approach. You initialize an array of 26
integers. Each index holds the frequency of a single character.
arr[0] holds frequency of a arr[1] holds frequency of b . . arr[25] holds frequency of z
We can now take advantage of this.
- Iterate through the first string and increment the indexes corresponding to each character.
- Next, go over the second string and decrement the indexes corresponding to each character.
The characters that occurred same number of times will have their frequency changed to 0
. The extra characters would have some frequency remaining. The sum of all these frequencies will determine how many characters need to be deleted in order to make these 2
strings anagrams.
Code:
int makingAnagrams(String s1, String s2) {
int[] c = new int[26];
s1 = s1.toLowerCase();
s2 = s2.toLowerCase();
for (int i = 0; i < s1.length(); i++) {
c[s1.charAt(i) - 'a']++;
}
for (int i = 0; i < s2.length(); i++) {
c[s2.charAt(i) - 'a']--;
}
int total = 0;
for (int i : c) {
total += Math.abs(i);
}
return total;
}
Code language: Java (java)
Time Complexity: O(n)
Space Complexity: O(1)