Random Testing
On Homework 5, you briefly experimented with creating a random list and using it to test your implementation of tree sort. Let’s bring that idea more into focus.
Today we’ll:
- talk about sorting arbitrary records, rather than just numbers;
- quickly introduce one last sorting algorithm; and
- continue to learn how to use a random-number generator as an assistive device to help us test better.
The presentation of the new sort, Quicksort, is deliberately rushed to showcase the value of random testing.
A final sorting algorithm
Suppose we liked the overall structure of merge sort, but we didn’t like the fact that the actual sorting is done in merge
. We could create a variant that splits the list in a way that lets us combine sublists with +
, instead.
def quick_sort(l: list) -> list:
if len(l) <= 1:
return l[:]
# ???
side1_sorted = quick_sort(side1)
side2_sorted = quick_sort(side2)
return side1_sorted + side2_sorted
If we want to avoid a more complicated merge, however, we need to split the list by some more elaborate method than just slicing it in half. How about we pick some arbitrary element in the list and use that value as a dividing line? Concretely, we’ll separate out the elements that are less than this element (which we’ll call the pivot) and those greater than it.
def quick_sort(l: list) -> list:
if len(l) <= 1:
return l[:]
pivot = l[0] # many other choices possible
smaller = [x for x in l if x < pivot]
larger = [x for x in l if x > pivot]
smaller_sorted = quick_sort(smaller)
larger_sorted = quick_sort(larger)
return smaller_sorted + [pivot] + larger_sorted
How confident are we that this implementation works?
Random Testing (Continued)
We very briefly saw random testing last time. Let’s talk more about it.
Motivation: Humans
Think of a country. What country are you thinking of?
Think, then click!
Chances are, the country you thought of was: - close to home; - large; or - in the news often. I'd bet that the country you thought of was also in existence today. You probably didn't say [the USSR](https://en.wikipedia.org/wiki/Soviet_Union) or [Austria-Hungary](https://en.wikipedia.org/wiki/Austria-Hungary). And note that my choices there were all limited by my own historical knowledge. I went and [looked up more](https://en.wikipedia.org/wiki/List_of_former_sovereign_states) after writing that sentence. Even if we only count nations that existed after the U.N. was created, there are many: the [Republic of Egypt (1953-1958)](https://en.wikipedia.org/wiki/History_of_Egypt_under_Gamal_Abdel_Nasser#Republic_of_Egypt_(1953–1958)), the [Fourth Brazilian Republic (1946-1964)](https://en.wikipedia.org/wiki/Fourth_Brazilian_Republic), etc.Why does that impact software testing?
Think, then click!
You can only test what you imagine needing to test. This is heavily influenced by what you have loaded into your "mental cache"---this is sometimes called availability bias or the [availability heuristic](https://en.wikipedia.org/wiki/Availability_heuristic). If you haven't been thinking of it recently, you likely won't test it. Although it's somewhat mitigated by careful and disciplined thought, the problem persists.But that’s dangerous. If humans are innately poor at testing (and even those with training aren’t always great at it) then is testing just doomed in general?
Computers as assistive devices
When confronted with our limitations, humans create assistive devices. These have included (e.g.) the pole lathe, the slide rule, and so on.
Maybe we can use our computers as an assistive device to help us with testing?
What’s the hard part of writing a test case? Usually it’s coming up with a creative input that exercises something special about the program. What if we let a computer come up with inputs for us? We could do that in a variety of different ways; random generators are just one. (For more on this topic, see CSCI 0320 or especially CSCI 1710.)
(1) Building random lists
We’d write a function that produces random lists (in this case, of integers):
def random_list(max_length: int, min_value: int, max_value: int) -> list:
length = randint(0, max_length)
return [randint(min_value, max_value) for _ in range(length)]
(2) Running our sort on those random lists
MAX_LENGTH = 100
MIN_VALUE = -100000
MAX_VALUE = 100000
NUM_TRIALS = 100
def test_quicksort():
for i in range(NUM_TRIALS):
test_list = random_list(MAX_LENGTH, MIN_VALUE, MAX_VALUE)
quick_sort(test_list)
This may look like it’s not doing anything. While it’s true that there’s no assert
statement yet, we are testing something: that the quick_sort
function doesn’t crash!
This sort of output-less testing is called fuzzing
or fuzz testing
and it’s used a lot in industry. After all, there’s nothing that says we need to stop at 100
trials! I could run this millions of times overnight, and gain some confidence that my program doesn’t crash (even if it might not produce a correct result overall).
(3) Dealing with the output question
Of course, we’d still like to actually test the outputs on these randomly generated inputs. It wouldn’t really be feasible to produce a million inputs automatically, and then manually go and figure out the corresponding outputs. Instead, we should ask: do we have another source of the correct answer? Indeed we do: our own merge sort, or better yet, Python’s built in sorting function.
MAX_LENGTH = 100
MIN_VALUE = -100000
MAX_VALUE = 100000
NUM_TRIALS = 100
def test_quicksort():
for i in range(NUM_TRIALS):
test_list = random_list(MAX_LENGTH, MIN_VALUE, MAX_VALUE)
# We could also compare the result of Python's sort here...
assert quick_sort(test_list) == merge_sort(test_list))
As it turns out, this procedure quickly finds a bug in our quick_sort
implementation: if a list has duplicate elements, those duplicates will be deleted! Sadly, we’re only retaining one copy of the pivot in the above code. Instead, we’ll:
def quick_sort(l: list) -> list:
if len(l) <= 1:
return l[:]
pivot = l[0] # many other choices possible
smaller = [x for x in l if x < pivot]
larger = [x for x in l if x > pivot]
same = [x for x in l if x == pivot]
smaller_sorted = quick_sort(smaller)
larger_sorted = quick_sort(larger)
return smaller_sorted + same + larger_sorted
Perspective
It doesn’t always make sense to use random testing, but when it does it’s a powerful technique: it literally allows you to mine for bugs while you sleep!
Hopefully it goes without saying, but there’s nothing specific to sorting about this technique: in fact, most of you could use it to help test your final projects, if you wanted! The key is not to rely on it completely: always use random testing to augment an existing, cleverly-chosen set of manual test cases. And if your random testing finds a new bug, as we did today, just add that input and output to your manual suite: you’ll never let that bug sneak by you again!
There’s a lot more to this story. If you want to explore it more, try using this approach on randomly generated lists of records, rather than randomly generated lists of numbers.
Sorting more than numbers
Let’s define a class whose job is to represent a record that combines someone’s age and name. We could use dataclasses for this (or tuples, or lists, or many other options) but we’ll use a plain class for two reasons.
class Record:
def __init__(self, age, name):
self.age = age
self.name = name
def __eq__(self, other):
return self.age == other.age and self.name == other.name
def __ne__(self, other):
return self.age != other.age or self.name != other.name
def __lt__(self, other):
return self.age < other.age
def __gt__(self, other):
return self.age > other.age
def __ge__(self, other):
return self.age >= other.age
def __le__(self, other):
return self.age <= other.age
def __repr__(self):
return f'Record({self.age}, {self.name})'
The first is that dataclasses won’t let you use different subsets of fields for ==
and <
. Here, we want the records to be sorted by age alone, but equality should use both fields. This means that not (x < y)
and not (x > y)
together don’t imply x == y
.
The second reason is that I’d like to demonstrate how to define <
, ==
, etc. in Python manually, which is what the above code does.
Alternative: via a dataclass
We could also do something like this via a dataclass, but then we’ve got to either include or exclude each field from all comparisons:
# Immutable, but also auto-generate <
@dataclass(frozen=True,order=True)
class Record:
age: int
# exclude "name" from use in <, >, etc.
name: str = field(compare=False)
The order=True
parameter tells Python to automatically create ordering functions (<
, >
, etc.) that work over instances of this dataclass. The field
declaration excludes the name
field from these comparisons. As a result, Record
s will be ordered entirely by their age
field.
Sorting records
If we wanted to sort lists of Record
s, how would we need to change our merge sort function from last time?
Think, then Click!
We won't need to make any changes at all. Because the comparison functions are defined for `Record` (because of the `dataclass` annotations we added), our existing code will handle `Record`s just fine. That's the power of polymorphism! Of course, if we tried to sort a list that contained both numbers and records, we'd get an error, since `<` is only defined for comparing a number to a number, a `Record` to a `Record`, a string to a string, and so on.Do different correct sorting algorithms always agree perfectly on their output, now?
Think, then click!
No. Not all sorting algorithms will produce the same ordering for elements with identical keys. If the values being sorted are just numbers, this is immaterial. If the values are more complex, we may see disagreement.
I say “may” because it depends on the low-level specifics of the sorting code. For example, this list may or may not be sorted differently by two different functions, even though those functions are correct, because only the age
field matters for sorting:
[Record(41, "Tim"), Record(41, "Nim")]
There are ways to work around this: we can do more than just compare results. What if we wrote a function that recognized what correctness meant? It might look something like this:
def verify_sorted_correctly(input: list, output: list) -> boolean:
# code to check that the output is in order
# code to check that the output contains a permutation of the input
# ...
A final bug
There is a subtle error in our quicksort
function that only appears when we’re testing with records, not integers. Above, we said that we’d built the Record
class so that:
not (x < y)
andnot (x > y)
together don’t implyx == y
.
Given this, why are we using ==
to produce the pivot sublist? If two records are generated with the same age, but different names, won’t one of them be dropped?
Here’s the fixed code:
def quick_sort(l: list) -> list:
if len(l) <= 1:
return l[:]
pivot = l[0] # many other choices possible
smaller = [x for x in l if x < pivot]
larger = [x for x in l if x > pivot]
# don't assume that not > and not < implies ==:
same = [x for x in l if not (x < pivot) and not (x > pivot)]
smaller_sorted = quick_sort(smaller)
larger_sorted = quick_sort(larger)
return smaller_sorted + same + larger_sorted
Equality is challenging.