Today we’ll actually implement a sorting function that uses the merge function we wrote last time. We’ll also learn a new way to test our sort. The livecode for merge sort is here, and the expanded testing code is here.

We’re unlikely to have time to talk about the analysis or recurrence relations in class; please read these notes!

From merging to sorting

It turns out that we can use merge to implement a more efficient kind of sorting program.

Suppose we have a list of length $N$ that we want to sort. For simplicity, let’s suppose it takes exactly $N^2$ steps to sort via our existing algorithms. But let’s try playing a computer science trick: divide the data and recur. How about we cut the list in half? (We’ll ignore odd length lists for now.)

Note that this is roughly the same idea as we saw in Binary Search Trees. But we don’t have any guarantees about the smaller lists; they aren’t sorted. At least, not yet.

Now let’s use the same algorithm as before to sort these two sublists. Since the algorithm takes $N^2$ steps to sort a list of length $N$ (remember, we’re simplifying out constants and so on), each sublist takes $(\dfrac{N}{2})^2 = \dfrac{N^2}{4}$ to sort. If we add both times together, we get $\dfrac{N^2}{2}$.

Somehow, we cut the work in half. The “lost” work isn’t recovered in merging: that’s just $N$ steps, and so we spend $\dfrac{N^2}{2} + N$ time to sort the big list. We’ve made a tradeoff here. Paid $N$ work to cut the $N^2$ in half. Here’s a Wolfram Alpha plot illustrating the difference.

Let’s try again

If it worked once, it’s worth trying again. let’s divide the list into quarters.

Now, to sort these 4 sublists we pay $4(\dfrac{N}{4})^2 = 4(\dfrac{N^2}{16})=\dfrac{N^2}{4}$ operations. We have to merge the quarters into halves, and halves into the whole, and so the total cost after merging is: $\dfrac{N^2}{4} + 2(\dfrac{N}{2}) + N$.

Here’s Wolfram Alpha again:

Dividing the list seems to have paid off again. What if we keep doing this division and merging, until we can’t divide the list anymore (i.e., we’ve got a bunch of 1-element lists)?

If we keep doing the division, the quadratic term ends up going away! We get a long chain of $O(N)$ terms instead. Now the key is _how many $O(N)$ terms are there? If there are $O(N)$ of them, we’re back where we started. But if there are fewer…

The key question is: how many times must we divide the list in half?

How many levels are there in this tree? $log_2(N)$. We have no worries here about whether the tree is balanced, because we’re splitting the list evenly every time; the tree can’t help but be balanced.

Every row does $N$ work, and there are $log_2(N)$ rows. So the total work done is $N log_2(N)$. Even if we drop the simplification, the big-O notation works out to be $O(N log_2(N))$. Which is pretty great, compared to insertion sort. Here s one final Wolfram Alpha plot:

A correction (but it works out OK)

I cheated and didn’t count splitting the list. Fortunately, we’ll be able to do that in $O(N)$, so the time spent per merge doubles: once to split, once to merge back together. And that constant factor drops out, leaving us with $O(N*log_2(N))$ still.

Aside: Python’s Sorting Algorithm

Timsort is a hybrid of merge and insertion sort that’s built to take advantage of small sorted sublists in real-world data. Merge sort turns out to be best case $O(N log_2(N))$; combining the 2 ideas leads to best case $O(N)$. We won’t talk about the details in 0112, but notice the power of combining multiple ideas.

Implementing Merge Sort

Let’s write this sort. We’ll have an easier time if we learn a new trick in Python.

Slicing Lists

If we have a list l in Python:

l = [1, 2, 3, 4, 5]

Suppose we want to obtain a new list containing a 2-element sub-list of l, starting at element 1. We can do this in Python by slicing list:

l2 = l[1:3]

The new list l2 will contain [2,3]. Here’s how it works: a_list[A:B] creates a copy of a_list starting at (and including) element A and ending just before (i.e., not including) element B. If A is omitted, the new list starts at the beginning of a_list. If B is omitted, the new list ends at the end of a_list. This means that you can very conveniently do things like:

Beware

List slicing creates a new list; modifying the new list won’t affect the old list!

There is also a cost associated with copying data from the old list into the new list, although this won’t be much of an issue here.

Code

We’ll re-use the tests we’ve written for other sorts. Let’s start coding. We know that we want to split the input list in half, recur, and then merge the results:

def merge_sort(lst):
    left = # ???
    right = # ???
    sorted_left = merge_sort(left)
    sorted_right = merge_sort(right)
    return merge(sorted_left, sorted_right)

Can we use list slicing to get left and right? Yes:

def merge_sort(lst):
    mid = # ???
    left = lst[:mid]
    right = lst[mid:]
    sorted_left = merge_sort(left)
    sorted_right = merge_sort(right)
    return merge(sorted_left, sorted_right)

Notice how the initially-strange decision to make slicing be exclusive of the value to the right of : makes for very clean code here.

Now we just need to compute mid, the index to divide the list at. We might start with len(lst)/2, but if the list length is odd this will not be an integer. Instead, we could either convert to an integer (int(len(lst)/2)) or tell Python we want integer division (len(lst)//2).

def merge_sort(lst):
    mid = len(lst) // 2  # integer division
    left = lst[:mid]
    right = lst[mid:]
    sorted_left = merge_sort(left)
    sorted_right = merge_sort(right)
    return merge(sorted_left, sorted_right)

This is starting to look right, or nearly. We’ve written merge_sort to be recursive, but haven’t given a base case. That is, there’s no place that Python can stop!

We might initially write:

    if len(lst) <= 1 return lst 

But there’s something wrong here. What is it?

Think, then click!

We said that we wanted merge sort to return a different list. If we just return lst, we’re breaking that contract and future trouble might occur. Imagine if the programmer calling our merge_sort function expects us to have returned a copy! Then they might feel free to change the returned list, expecting the original to be unmodified.

Instead, we’ll finish the function like thus:

def merge_sort(lst):
    if len(lst) <= 1 return lst[:] # copy the very small list
    mid = len(lst) // 2  # integer division
    left = lst[:mid]
    right = lst[mid:]
    sorted_left = merge_sort(left)
    sorted_right = merge_sort(right)
    return merge(sorted_left, sorted_right)

For more on this subtlety, see future discussion of what “equal” means in programming.

Where are we actually sorting anything?

The work is done in merge. To see this on a small example, consider a list of length 2.

We’ve written a recursive function whose recursive structure isn’t echoed in the data. Here are two examples to contrast against merge_sort:

In contrast, merge_sort is recurring on slices of a list. We aren’t following any recursive structure in the data itself. We’ll say that merge sort is a divide and conquer algorithm, but without the division being explicit in the shape of the data.

Performance of Merge sort

How long does this sorting algorithm take to run? Do we expect the worst and best cases to be different (like in insertion sort) or the same (like in selection sort)?

Let’s label each line of the code with a comment, like we’ve done before.

def merge_sort(lst):
    if len(lst) <= 1 return lst[:]   # 1 operation
    mid = len(lst) // 2              # 1 operation
    left = lst[:mid]                 # around N/2 operations
    right = lst[mid:]                # around N/2 operations
    sorted_left = merge_sort(left)   # ???
    sorted_right = merge_sort(right) # ???
    return merge(sorted_left, sorted_right) # N operations

The question is: what do we do for the recursive calls?

To handle this, we’ll use another classic computer-science trick: introducing a new name, and using it for the quantity we’re unsure about. Suppose that we call “the number of operations that merge_sort uses on a list of length N” by the name $T(N)$. Then, we can plug in $T(N/2)$ for those two ??? comments. (I am being somewhat imprecise here; if you take future CS classes that cover the material, you will see the full development.)

Then, $T(N) = 1 + 1 + \dfrac{N}{2} + \dfrac{N}{2} + T(\dfrac{N}{2}) + T(\dfrac{N}{2}) + N = 2T(\dfrac{N}{2})+N+2$.

This sort of equation is called a recurrence relation, and there are standard techniques for solving them. The end result is a more formal justification for what we drew out in pictures before: merge sort runs in $O(N*log_2(N))$ time—in both the worst and best cases.

An Experiment: Random Testing

To close today, I want to show you an idea to try out on homework 5. We’ll come back next time and discuss this idea and its implications, along with why it might be useful (or not).

Fact: we can build random lists

We can write a function that produces random lists (in this case, of integers)using Python’s randint function (which we can import via from random import randint):

def random_list(max_length: int, min_value: int, max_value: int) -> list:
  length = randint(0, max_length)
  return [randint(min_value, max_value) for _ in range(length)]

Fact: we can run our program on a random list

MAX_LENGTH = 100
MIN_VALUE = -100000
MAX_VALUE =  100000
NUM_TRIALS = 100
def test_mergesort_random():
    for i in range(NUM_TRIALS):
        test_list = random_list(MAX_LENGTH, MIN_VALUE, MAX_VALUE)
        merge_sort(test_list)

This may look like it’s not doing anything. While it’s true that there’s no assert statement yet, we are testing something: that the merge_sort function doesn’t crash!

This sort of output-agnostic testing is called fuzzing or fuzz testing and it’s used a lot in industry. After all, there’s nothing that says we need to stop at 100 trials! I could run this millions of times overnight, and gain some confidence that my program doesn’t crash (even if it might not produce a correct result overall).

Fact: we can check the output

Of course, we’d still like to actually test the outputs on these randomly generated inputs. It wouldn’t really be feasible to produce a million inputs automatically, and then manually go and figure out the corresponding outputs. Instead, we should ask: do we have another source of the correct answer? Indeed we do: Python’s built in sorting function.

MAX_LENGTH = 100
MIN_VALUE = -100000
MAX_VALUE =  100000
NUM_TRIALS = 100
def test_mergesort_random():
    for i in range(NUM_TRIALS):
        test_list = random_list(MAX_LENGTH, MIN_VALUE, MAX_VALUE)
        assert merge_sort(test_list) == sorted(test_list)

What do you think is going on here?

Note on the homework

Homework 5 includes wording like this:

Inspired by the code from lecture, create a random testing function for your tree sort implementation.

We’ll return to this idea of random testing soon. For now, do what we just did in class on the homework—and expect to discuss it when we return!