Sorting Intro: Selection and Insertion

These notes span 2 class meetings: Friday (Nov 10) and Monday (Nov 13).

The livecode for insertion sort is here. The tests are here.

Sorting

Let’s start with an exercise. Suppose you were given a list of numbers and asked to put them in sorted order. In class, we’ll use notecards for this, but in these notes, I’ll just write the list:

7 4 2 8 5 3

For now, let’s not worry about how the list is represented: it’s not a Python list or a linked list or anything else like that. It’s just a bunch of numbers on paper.

How would you approach sorting this list by hand? Think in basic, tactile terms like swapping cards or moving numbers around.

Two possible approaches of many

There are lots of ways to proceed. Here’s one sequence, where we keep moving the lowest element in the list up, building a sorted prefix as we go along:

7 4 2 8 5 3

2 7 4 8 5 3

2 3 7 4 8 5

2 3 4 7 8 5

2 3 4 5 7 8

2 3 4 5 7 8

2 3 4 5 7 8

I’ve used bold red to mark the sorted prefix of numbers that have already been shuffled into their proper place, and italic blue to mark the element being moved next. So after the 2 has been moved, the 3 is put into position after it and then gets bolded and colored red.

Here’s another, where we always take the first unsorted element and move it into its proper place in the sorted prefix:

7 4 2 8 5 3

7 4 2 8 5 3

4 7 2 8 5 3

2 4 7 8 5 3

2 4 7 8 5 3

2 4 5 7 8 3

2 3 4 5 7 8

What do you notice in common between these two approaches?

They both move one number at a time, and implicitly divide the list into “sorted” and “unsorted” sections.

What do you notice that’s different?

The first approach always finds and moves the smallest element remaining in the unsorted list suffix. The second always moves the first element in the unsorted list suffix.

The first method is called selection sort because it selects the smallest element to move every time. The second method is called insertion sort, because it inserts the next element into its proper place.

These are the two sorting algorithms that people most often use naturally when sorting things like cards or paper records. They both have more or less the same worst-case efficiency, but more on that later.

Insertion Sort: Runtime

Let’s focus on insertion sort. Can we write out the program, or at least specify the steps? Here’s a description we might use:

loop over everything in the list from left to right:
    move this element into its proper place in the sorted prefix

We haven’t written Python yet, but we can still reason informally about the worst-case runtime. You might see opportunities to optimize, but for now let’s stay with the above rough idea.

On a list of length N, what will the big-O runtime of insertion sort be?

Think, then click!

$O(N^2)$. This is because the first line in the above description loops $N$ times, and the second line loops from $1$ to $N-1$ times, depending on how far into the list we are and how unsorted it is.

This is the same situation as we saw when writing distinct a few weeks back. In the worst, case, the list is sorted in reverse order, and we need to move each element to the beginning of the list. But we won’t discover that until we’ve actually examined the entire sorted prefix. We’ll check $1$, then $2$, then $3$, and so on.

$\sum_{i=1}^{n}(i) = \dfrac{n(n-1)}{2}$, which is in $O(N^2)$.

How about the best case?

Remember that we’re talking about the best case for an arbitrary list length. Of course a list of only 1 element is easy to sort, but that’s not what we mean by “best case” here.

Think, then click!

In the best case, the list is already in order and so no elements need to be moved at all; they’re already in the proper insertion position. Thus, insertion sort’s best case performance is $O(N)$. That’s great!

Maybe we can do better than worst-case $O(n^2)$ However, if your list is mostly sorted, then insertion sort can perform very well.

</br>

We’ll come back to selection sort later.

Implementing Insertion Sort

Let’s write a version of insertion sort that works over Python lists. We’ll make the design decision to modify the list itself, rather than returning a new list; this is sometimes called an in place sort. Because of this, the function won’t return anything.

def insertion_sort(lst: list):
    """
    loop over everything in the list from left to right:
         move it into its proper place in the sorted prefix
    """
    pass

But before we write the code…

Some tests

Let’s write some tests as examples to make sure we understand the problem and some of the most likely ways we could get it wrong. Here are some (non-exhaustive) examples:

def test_insertion_sort():
    list1 = []
    list2 = [1]
    list3 = [1, 2, 3]
    list4 = [-1, 17, 12, 4, 2, 14, 1, 1, 17]
    # ...
    insertion_sort(list1)
    insertion_sort(list2)
    insertion_sort(list3)
    insertion_sort(list4)
    assert list1 == []
    assert list2 == [1]
    assert list3 == [1, 2, 3]
    assert list4 == [-1, 1, 1, 2, 4, 12, 14, 17, 17]

Writing the code

At first, we might want to start like this. After all, that’s what the 2-line summary said: “loop over everything in the list”.

    for item in lst:
        # ???

But there’s a problem. All we have in the loop body is the element itself, not the context we need to move the element around in the list. We don’t know where the previous element is, or if there even is one. We’re going to need a different kind a loop.

A while loop in Python is a loop that keeps running so long as its condition is true. We’ll set one of those up here, starting at index 1 (since the first element is already “sorted” on its own):

    index = 1
    while index < len(lst):
        # ???        

Note that a while loop is more dangerous than a for loop. We need to stop it somehow, or else it will continue running forever. We’ll want to make sure that the index variable keeps increasing or we’ll be in trouble.

(If you ever find yourself in an infinite loop in Python, you can often terminate the program by hitting CTRL-C.)

Moving on, we might proceed like this, using explicit swapping to move elements upward:

    index = 1
    while index < len(lst):
        element = lst[index]
        if element < lst[index-1]: # safe because index starts at 1
            lst[index] = lst[index-1]
            lst[index-1] = element            
        index = index + 1

This will work great on lists of 1 or 2 elements, but on larger lists we might need to move an element further than just 1 position. To see why, look at the example from before:

7 4 2 8 5 3

If index == 1 (pointing to the 4), the above works. But after that step:

4 7 2 8 5 3

when index == 2 (pointing to the 2), we would produce:

4 2 7 8 5 3

which doesn’t move the 2 far enough.

Now what? We need to keep moving the element as long as needed. That sounds like another loop. But we need to be careful to make sure it stops:

    index = 1
    while index < len(lst):
        element = lst[index]
        while index > 0 and element < lst[index-1]: # safe because index starts at 1
            lst[index] = lst[index-1]
            lst[index-1] = element            
            index = index - 1
        index = index + 1

That’s better. Maybe it even works! We can try some examples, and watch it running in the debugger.

In fact, let’s watch it work on our big test in the debugger. We’ll see that we’re inefficiently conflating the index of the element we’re inserting and the index we’re considering for insertion at. Let’s give these two ideas different names:

    index = 1
    while index < len(lst):
        element = lst[index]
        insertion_index = index
        while insertion_index > 0 and element < lst[insertion_index-1]: # safe because index starts at 1
            lst[insertion_index] = lst[insertion_index-1]
            lst[insertion_index-1] = element            
            insertion_index = insertion_index - 1
        index = index + 1

Insertion sort runtime (more formally)

Now let’s convince ourselves that the worst-case runtime is indeed in $O(n^2)$. Here’s a worst-case input:

[5, 4, 3, 2, 1]

The outer loop will perform $N-1 = 4$ operations (i.e., we’ll move 4 elements). The inner loop runs once, then twice, and so on. So our intuition above seems to fit. We can confirm this by examining the program and labeling how many times each operation runs:

    index = 1                      # 1 operation
    while index < len(lst):        # N-1 operations (comparisons)
        element = lst[index]       # N-1 operations
        insertion_index = index    # N-1 operations
        while insertion_index > 0 and element < lst[insertion_index-1]:    # N(N-1)/2 times
            lst[insertion_index] = lst[insertion_index-1] # N(N-1)/2 times
            lst[insertion_index-1] = element     # N(N-1)/2 times
            insertion_index = insertion_index - 1 # N(N-1)/2 times
        index = index + 1          # N-1 operations

In total, this is $1 + 4(N-1) + 4(N(N-1)/2)$, which is in O(N^2)$.

Selection sort runtime

What about selection sort? Let’s look at a run; recall that selection sort works by finding the smallest element that isn’t in the sorted prefix, and putting it at the end of the sorted prefix.

3 1 4 7 2 1 3 4 7 2 1 2 3 4 7 # can it tell we’re done yet? (this is important) 1 2 3 4 7
1 2 3 4 7

We might write this as:

for a list of length N, do N times:
  find the smallest element of the unsorted list suffix
  move it to the end of the sorted list prefix

Selection sort doesn’t know where that smallest element is. It has to look for it, no matter what. As a result, there’s no distinction between the best and worst cases: it always proceeds in the same way. It needs $O(n)$ steps to find the smallest unprocessed element every time.

Looking ahead

It turns out that we can be very clever and get better worst-case performance from a sorting algorithm than $O(n^2)$. More on this next time! But for now, I wonder if there’s a better way to find the right spot to insert elements than a linear search?

It’s also worth thinking about when one sort might be better than another. Worst-case and best-case runtimes are important, but not the only criteria. What else do you think we should care about when picking an approach to sorting?