Indy's Weblog
2020 Jul 13

Quick look at Quicksort

You need some order in this mess

Recently I stumbled upon a blog post where an author was trying to describe Quicksort. I was surprised by how incomplete and rather useless the article was describing a sorting algorithm, given the technical authority the author was purporting to have. So many words, yet little substance. And then I looked at other sources to see if this is an oddity, and it looked like almost all of them does a poor job at describing it.

So here we go:

History

Quicksort was created by CAR Hoare. Read about more here on Wikipedia. Unfortunately Wikipedia doesn't describe the algorithm in an accessible way.

Quicksort is a divide and conquer algorithm, Worst case algorithmic complexity is $O(n^2)$, average case complexity is $O(n log_n)$. Complexity analysis is at the end of post.

Quicksort's basic intuition is that, you select an element called pivot and move the elements less than the pivot to the left of the pivot, and the others to the right. Then apply the same process to the left segment and right segment.

Let's look at the simplest implementation which requires additional memory. And then look at an in-place implementation.

# list is the sequence to be sorted
def qs(list):
    if len(list)<=1:
        return list

    lower = []
    higher = []
    pivot = list.pop()
    for item in list:
        if item < pivot:
            lower.append(item)
        else:
            higher.append(item)
    
    return qs(lower)+ [pivot] + qs(higher)

Example: 8, 3, 6, 4, 2, 1, 9, 5, 7

Select last element as the pivot and scan the array, moving items as described above.

Here are some steps unrolled:

  1. compare 8 with pivot value, (7), and it is larger, so it should be on the right of pivot
[]   7   [8]
  1. compare 3 with pivot value, (7), and it is smaller, so it should be on the left of pivot
[3]   7   [8]
  1. compare 6 with pivot value, (7), and it is smaller, so it should be on the left of pivot
[3,6]   7   [8]

...

and then after we've fully scanned the list it would look like this.

(1)  [3,6,4,2,1,5]  7  [8,9]

Now we've done $n$ comparisons. Notice that we preserve the order, this means if there were equal comparable items, they keep the original order. This becomes useful when we want to sort items by several critera (sort by price, and then by date). But it is important to note that in general Quicksort is not considered to be a stable sort depending on the partition implementation.

Also note that in this case the partitioning is not balanced. This lead to a worse case scenario. We approach the best case scenario when the partitions are balanced.

Next recursion is on the left segment, and then the right segment

(2) source = [3,6,4,2,1,5]
pivot = 5
leads to 
[3,4,2,1] 5 [6]

Now we look at the segment [3,4,2,1]

(3) source = [3,4,2,1]
pivot = 1
leads to
[] 1 [3,4,2]

Now this leads to working on segment [3,4,2]

(4)source = [3,4,2]
pivot = 2
leads to
[] 2 [3,4]

left side returns, which leads to working on segment [3,4]

(5) source = [3,4]
pivot = 4
leads to
[3] 4 []

this returns [3,4] on the right side of step 4 above.

Step 4 returns with []+[2]+[3,4] -> [2,3,4]

Step 3 return with []+[1]+[2,3,4] -> [1,2,3,4]

Step 2 returns with [1,2,3,4] + [5] + [6] -> [1,2,3,4,5,6] (right side recurse but immediately retursn as it has only one element)

At Step 1 we work on the right segment after returning from lefr recursion:

source = [8.9]
pivot = 9
leads to [8] 9 []

Step 1 returns with [1,2,3,4,5,6]+[7]+[8,9] -> [1,2,3,4,5,6,7,8,9] Now all the recurssions return, with the sorted array.

In-place Implementation

Here's the in-place implementation as created by C.A.R. Hoare:

# list is the sequence to be sorted
# left = left index to start from
# right = right index to finish at

def qs(list, left, right):
    if left>=right: return list
    p = partition(list, left, right)
    qs(list, left, p)
    qs(list, p + 1, right)

def partition(list, left, right):
    k = (left+right)//2
    pivot = list[k]
    i = left; j = right
    while True:
        while list[i] < pivot:
            i+=1
        while list[j] > pivot:
            j-=1
        if i>=j : 
            return j
        swap(list, i, j)

def swap(list, i, j):
    temp = list[i]
    list[i] = list[j]
    list[j] = temp

Hoare's implementation selects a middle point of the array as the pivot. Intuition is that for a ascending sort order the left side items should be less than pivot, and right side should be higher. So we scan the segment and change it so this criteria is met.

We iterate on the left until we find something that is not lower. At the same time, iterate from right (backward) until we find sometihng that is not higher.

Then these two elements are swapped.

We return when the left and right cursors are met. What this means is that we scanned the whole segment and validated that, left is lower and the right is higher than the pivot.

Let's consider the same list as before: 8, 3, 6, 4, 2, 1, 9, 5, 7 so here the pivot would be 2 and have our two cursers at 8 and 7 (at each end).

  1. compare 8 with 2, and it shouldn't be on this side. so we break from the loop.
  2. compare 7 with 2, and it should be on this side. so we continue scanning until we find 1, which shouldn't be on this side.
  3. swap 8 with 1 resulting 1,3,6,4,2,8,9,5,7
  4. cursors are now at index of 1 and index of 2
  5. compare 1 with 2, and it should be on this side. so we continue scanning until we find 3, which shouldn't be on this side.
  6. compare 2 with 8, and it should be on this side. so we continue scanning until we find 2,
  7. swap 3 with 2 resulting 1,2,6,4,3,8,9,5,7
  8. after scanning and not finding anything else to swap, our cursors meet at index 1
  9. sorting 1,2 would result the same.
  10. sorting 6,4,3,8,9,5,7 is the next
  11. pivot is 8
  12. left scanning breaks the loop on 8 and right scanning breaks loop on 7
  13. after swapping it looks 6,4,3,7,9,5,8
  14. continue scanning from 7 and we reach 9 on the left, we break at 8
  15. after swapping it lools like 6,4,3,7,8,5,9
  16. continue scanning from 8 we break on the left, and on the right break at 5
  17. after swapping it looks like 6,4,3,7,5,8,9. since the cursors meet at index of 8 we return
  18. looking at the left split, we operator on 6,4,3,7,5
  19. pivot would be 3
  20. swap 6 with 3 resulting 3,4,6,7,5
  21. continue on, from 3 and 6 cursors meet at index of 3, and we return
  22. looking at the right split 4,6,7,5
  23. pivot would be 6
  24. swap 6 with 5 resulting 4,5,7,6
  25. continue on, from 5 and 6, we swap 7 with 6
  26. resulting 4,5,6,7. cursors meet at the index of 7
  27. right side 4,5,6 would result the same
  28. now there aren't any more to operate on so the recursive calls return

Complexity Analysis

In the best case scenario, the segments would be equal in length, resulting in a balanced binary tree of computations. At each level segment length halves.

Quicksort avg/best case

That means a tree with n elements have $log_2 n$ layers.

Explanation:

Segment length at each layer would be n/2, n/4, n/8 ... Which is $n/2^1, n/2^2, n/2^3 ... n/2^x$

and last layer with segment length 1 would be

$n/2^x = 1$

$n = 2^x$

taking $log_2$

$log_2 n = x$

Also note that at each level, we perform $n$ comparisons.

Therefor number of total comparisons = $n * number of levels$

where $number of levels = log_2 n $

So the overall complexity of the algorithm is $n log_2 n$

Worse case scenarios is when an array is already sorted, or almost sorted.

Quicksort worst case

This degenerates into each element being compared to every other element.

total number of comparisons $ = n + (n-1) + (n-2) + ... + 2$

$ = \sum_{i=2}^{n} = n(n+1)/2 - 1 $

$ = O(n^2)$

Note:

Sum of an infinite series of natural numbers is given by
infinite sum