Quick look at Quicksort
You need some order in this mess
Recently I stumbled upon a blog post where an author was trying to describe Quicksort. I was surprised by how incomplete and rather useless the article was describing a sorting algorithm, given the technical authority the author was purporting to have. So many words, yet little substance. And then I looked at other sources to see if this is an oddity, and it looked like almost all of them does a poor job at describing it.
So here we go:
History
Quicksort was created by CAR Hoare. Read about more here on Wikipedia. Unfortunately Wikipedia doesn't describe the algorithm in an accessible way.
Quicksort is a divide and conquer algorithm, Worst case algorithmic complexity is $O(n^2)$, average case complexity is $O(n log_n)$. Complexity analysis is at the end of post.
Quicksort's basic intuition is that, you select an element called pivot
and move the elements less than the pivot
to the left of the pivot, and the others to the right. Then apply the same process to the left segment and right segment.
Let's look at the simplest implementation which requires additional memory. And then look at an in-place implementation.
# list is the sequence to be sorted
def qs(list):
if len(list)<=1:
return list
lower = []
higher = []
pivot = list.pop()
for item in list:
if item < pivot:
lower.append(item)
else:
higher.append(item)
return qs(lower)+ [pivot] + qs(higher)
Example:
8, 3, 6, 4, 2, 1, 9, 5, 7
Select last element as the pivot and scan the array, moving items as described above.
Here are some steps unrolled:
- compare 8 with pivot value, (7), and it is larger, so it should be on the right of pivot
[] 7 [8]
- compare 3 with pivot value, (7), and it is smaller, so it should be on the left of pivot
[3] 7 [8]
- compare 6 with pivot value, (7), and it is smaller, so it should be on the left of pivot
[3,6] 7 [8]
...
and then after we've fully scanned the list it would look like this.
(1) [3,6,4,2,1,5] 7 [8,9]
Now we've done $n$ comparisons. Notice that we preserve the order, this means if there were equal comparable items, they keep the original order. This becomes useful when we want to sort items by several critera (sort by price, and then by date). But it is important to note that in general Quicksort is not considered to be a stable sort depending on the partition implementation.
Also note that in this case the partitioning is not balanced. This lead to a worse case scenario. We approach the best case scenario when the partitions are balanced.
Next recursion is on the left segment, and then the right segment
(2) source = [3,6,4,2,1,5]
pivot = 5
leads to
[3,4,2,1] 5 [6]
Now we look at the segment [3,4,2,1]
(3) source = [3,4,2,1]
pivot = 1
leads to
[] 1 [3,4,2]
Now this leads to working on segment [3,4,2]
(4)source = [3,4,2]
pivot = 2
leads to
[] 2 [3,4]
left side returns, which leads to working on segment [3,4]
(5) source = [3,4]
pivot = 4
leads to
[3] 4 []
this returns [3,4]
on the right side of step 4 above.
Step 4 returns with []+[2]+[3,4] -> [2,3,4]
Step 3 return with []+[1]+[2,3,4] -> [1,2,3,4]
Step 2 returns with [1,2,3,4] + [5] + [6] -> [1,2,3,4,5,6]
(right side recurse but immediately retursn as it has only one element)
At Step 1 we work on the right segment after returning from lefr recursion:
source = [8.9]
pivot = 9
leads to [8] 9 []
Step 1 returns with [1,2,3,4,5,6]+[7]+[8,9] -> [1,2,3,4,5,6,7,8,9]
Now all the recurssions return, with the sorted array.
In-place Implementation
Here's the in-place implementation as created by C.A.R. Hoare:
# list is the sequence to be sorted
# left = left index to start from
# right = right index to finish at
def qs(list, left, right):
if left>=right: return list
p = partition(list, left, right)
qs(list, left, p)
qs(list, p + 1, right)
def partition(list, left, right):
k = (left+right)//2
pivot = list[k]
i = left; j = right
while True:
while list[i] < pivot:
i+=1
while list[j] > pivot:
j-=1
if i>=j :
return j
swap(list, i, j)
def swap(list, i, j):
temp = list[i]
list[i] = list[j]
list[j] = temp
Hoare's implementation selects a middle point of the array as the pivot. Intuition is that for a ascending sort order the left side items should be less than pivot, and right side should be higher. So we scan the segment and change it so this criteria is met.
We iterate on the left until we find something that is not lower. At the same time, iterate from right (backward) until we find sometihng that is not higher.
Then these two elements are swapped.
We return when the left and right cursors are met. What this means is that we scanned the whole segment and validated that, left is lower and the right is higher than the pivot.
Let's consider the same list as before:
8, 3, 6, 4, 2, 1, 9, 5, 7
so here the pivot would be 2
and have our two cursers at 8
and 7
(at each end).
- compare
8
with2
, and it shouldn't be on this side. so we break from the loop. - compare
7
with2
, and it should be on this side. so we continue scanning until we find1
, which shouldn't be on this side. - swap
8
with1
resulting1,3,6,4,2,8,9,5,7
- cursors are now at index of
1
and index of2
- compare
1
with2
, and it should be on this side. so we continue scanning until we find 3, which shouldn't be on this side. - compare
2
with8
, and it should be on this side. so we continue scanning until we find2
, - swap
3
with2
resulting1,2,6,4,3,8,9,5,7
- after scanning and not finding anything else to swap, our cursors meet at index 1
- sorting
1,2
would result the same. - sorting
6,4,3,8,9,5,7
is the next - pivot is
8
- left scanning breaks the loop on
8
and right scanning breaks loop on7
- after swapping it looks
6,4,3,7,9,5,8
- continue scanning from
7
and we reach9
on the left, we break at8
- after swapping it lools like
6,4,3,7,8,5,9
- continue scanning from
8
we break on the left, and on the right break at5
- after swapping it looks like
6,4,3,7,5,8,9
. since the cursors meet at index of8
we return - looking at the left split, we operator on
6,4,3,7,5
- pivot would be
3
- swap
6
with3
resulting3,4,6,7,5
- continue on, from
3
and6
cursors meet at index of3
, and we return - looking at the right split
4,6,7,5
- pivot would be
6
- swap
6
with5
resulting4,5,7,6
- continue on, from
5
and6
, we swap7
with6
- resulting
4,5,6,7
. cursors meet at the index of7
- right side
4,5,6
would result the same - now there aren't any more to operate on so the recursive calls return
Complexity Analysis
In the best case scenario, the segments would be equal in length, resulting in a balanced binary tree of computations. At each level segment length halves.
That means a tree with n elements have $log_2 n$ layers.
Explanation:
Segment length at each layer would be n/2, n/4, n/8 ... Which is $n/2^1, n/2^2, n/2^3 ... n/2^x$and last layer with segment length 1 would be
$n/2^x = 1$
$n = 2^x$
taking $log_2$
$log_2 n = x$
Also note that at each level, we perform $n$ comparisons.
Therefor number of total comparisons = $n * number of levels$
where $number of levels = log_2 n $
So the overall complexity of the algorithm is $n log_2 n$
Worse case scenarios is when an array is already sorted, or almost sorted.
This degenerates into each element being compared to every other element.
total number of comparisons $ = n + (n-1) + (n-2) + ... + 2$
$ = \sum_{i=2}^{n} = n(n+1)/2 - 1 $
$ = O(n^2)$
Note:
Sum of an infinite series of natural numbers is given by