Bucket Sort is a sorting technique which puts limitations on the input set to get an improved performance. But, before we start learning about this, let us take a quick recap. This is essential so that you understand what are the use cases of Bucket Sort.
The sorting techniques that work in an average time complexity of O(n^2) are:
Some sorting techniques that have an average time complexity of O(n*\log n) are:
Counting Sort is a special sorting technique that does not work based on comparison, rather it sacrifices on space for an improved time. So after all these cases, what can we improve?
Basic Idea:
Bucket Sort algorithm works a little bit on the divide and conquer strategy. We divide the given elements into a finite number of buckets, and then sort those buckets individually. Going ahead, we use these sorted buckets to rebuild the final sorted array.
How does it work?
In the below example, we have an array with some elements that belong to a finite set. The values range from 0-25
.
Based upon the range of values, create some buckets that hold a range of values. For example: In this example, we can create 5 buckets that hold 5 values each. These 5 buckets can have the range as:
- #1.
1 - 5
- #2.
6 - 10
- #3.
11 - 15
- #4.
16 - 20
- #5.
21 - 25
Now, iterate over each of the element in the original array and store it in the respective bucket. For instance, the element 11
would go in bucket #3
. Similarly, element 9
would go in bucket 2
. Once all the elements have been categorized we will have some buckets of elements with us.
We have now reached the point where we take advantage of this algorithm. The elements have been divided into smaller problems which are now easy to solve 😄. Now you can sort these buckets individually and we will have several buckets which are sorted.
Start iterating over each of these buckets and populate the final resultant array. This will give you a sorted array.
Time Complexity Analysis:
For a moment, it may seem that this is the best sorting technique if we are not constrained by space. But there are a certain factors you should keep in mind:
- Best case scenario O(n + k): This will only happen when elements are uniformly distributed in the buckets with a nearly equal number of elements in each bucket. This ensures that we are not spending a lot of time to sort the bucket and each of the bucket almost takes similar time to sort.
k
is the number of elements in each bucket. - Worst case scenario O(n^2): Assume that the distribution is not uniform. The total range of elements is [
1-50
], but you have only5
elements between1-20
,100
elements between21-30
and again3
elements between31-50
. In this scenario, some of the buckets would be flooded, and it could take a lot of time to sort them. Buckets also have a fixed size, and hence you need to be careful when applying this technique.
Implementation:
static void bucketSort(int[] arr, int noOfBuckets) {
int globalMax = Integer.MIN_VALUE;
int globalMin = Integer.MAX_VALUE;
for (int i = 0; i < arr.length; i++) {
globalMin = Math.min(arr[i], globalMin);
globalMax = Math.max(arr[i], globalMax);
}
int range = globalMax - globalMin;
int bucketRange = (int) Math.ceil((double) range / noOfBuckets);
// Create bucket array
List<Integer>[] buckets = new List[noOfBuckets];
// Associate a list with each index in the bucket array
for (int i = 0; i < noOfBuckets; i++) {
buckets[i] = new LinkedList<>();
}
// Assign numbers from array to the proper bucket
// by using hashing function
for (int num : arr) {
buckets[hash(num, bucketRange, noOfBuckets)].add(num);
}
// sort buckets
for (List<Integer> bucket : buckets) Collections.sort(bucket);
int idx = 0;
// Merge buckets to get sorted array
for (List<Integer> bucket : buckets) {
for (int num : bucket) {
arr[idx++] = num;
}
}
}
// A very simple hash function
private static int hash(int num, int hashValue, int numberOfBuckets) {
int bucketNumber = num / hashValue;
if (bucketNumber == numberOfBuckets)
bucketNumber--;
return bucketNumber;
}
Code language: Java (java)
The implementation and test cases can also be found on GitHub.