HW 5: BSTs and Sorting
Due: Tue, Nov 19 2024 at 9:00 PM EST
Released: Wed, Nov 13 2024
The goal of this assignment is to practice working with the Binary Search Tree data structure and with sorting algorithms. In particular, this assignment will reinforce the intertwined nature of algorithms with data structures by having you implement a sorting algorithm with a tree traversal.
You will also get some practice with random testing, and set up for a discussion after November break.
Setup
The Assignment
Over the course of his career, Tim has given tens of thousands of talks. He tasks you with the responsibility of looking for a way to sort and search for any one of his talks. Immediately, you think of the insertion sorting algorithm and Binary Search Trees (BSTs) that you just learned about in CS0112!
Part 1: Binary Search Trees
We’ve provided you with a more cleaned up version of the BST implementation we discussed in class. Note that this implementation is more robust in two ways:
- It handles duplicate keys (e.g., the integer
3
might be stored multiple times in the BST); and - It returns the value found (if any) rather than just a boolean (to better support the extension we talked about in class, where the BST stores records, rather than just raw values).
Part 2: Tree Sort
Task 1: In tree_sort.py
, fill in the tree_sort
function which takes in a list and returns a sorted list.
Your function must use the tree sort algorithm in two stages:
convert_to_tree
: Create an empty BST and then iterate through the input list, inserting each element one-by-one into the new BST.traversal
: Perform an in-order traversal of the BST (following the algorithm below), inserting every element of the BST one-by-one into a new list.- Return the new list.
Note: Do not use any pre-defined sorting functions. Do not worry about balancing the tree; it may very well be that our implementation will produce an unbalanced tree; that isn’t the focus of this assignment.
Hint: Algorithm for in-order traversal
To convert your binary search tree into a sorted list, you should perform an in-order traversal of it. This means that when you traverse a node, you first recursively traverse its left child, then visit the node itself, then recursively traverse its right child. Here is pseudocode to follow, based on what we covered in class:- Create a new empty list.
- If there is a left child, recursively call
traversal
on the left child, and append its result to the list. - If the current node is non-empty, append its value to the list.
- If there is a right child, recursively call
traversal
on it, and append its result to the list. - Return the list.
Part 3: Unit testing
Task 2: Write tests for your tree_sort
function in test_tree_sort.py
.
What are some edge cases you can think of? Here are some examples to get you started:
- An empty list
- A list with one element
- A list with multiple elements
- A list with duplicate elements
- A list with negative elements
Part 4: Random Testing
Task 3: Inspired by the code from Monday’s lecture, create a random testing function for your treesort implementation. Use the code of any sorting algorithm as a baseline for correctness (in class, we used merge sort, but e.g. insertion sort would also be OK). Invoke this random tester from a separate function called test_tree_sort_randomly
in test_tree_sort.py
.
Part 5: README
In your README.txt
, please include answers to the following questions.
Suppose that you were sorting records (i.e., instances of a dataclass with multiple fields, only one of which is the key to be sorted over), rather than just integer keys. What issues might arise in your random testing approach?
For the third question, think carefully about what new complexities records, rather than just values, add. Suppose you were sorting (say) a list of records about people by their ages in ascending order. Does this affect your random testing in any way?
Note: Be prepared to follow up with a discussion after break.
Submission
Hand in the following files to Gradescope when submitting the assignment:
bst.py
tree_sort.py
test_tree_sort.py
README.txt
Please don’t put your name in your code files, as we grade anonymously. If you have any questions about the assignment, please post on Ed.
You can only use a maximum of 3 late days per assignment. If the assignment is late (and you do NOT have anymore late days) no credit will be given.