Design Recipe Example

In 0112, we’ll follow a structured approach to solving coding problems. Other courses call this sort of thing a “design recipe”, and we’ll follow suit. Of course, there are many different possible recipes! Think of this as guidance on getting started—we’ll say more in class, and add new ingredients to the recipe as they become necessary.

This document outlines the steps of our recipe, and gives an example of each. The starting point is always a problem statement. Here’s an example:

Given a table that describes the number of annual views for a set of sports, return which sports have viewership exceeding a given threshold.

Step 1: What shape of data do you have to work with?

Our input data arrives in tabular form:

Example Data: The Annual Number of Views for Common Sports

Sport Annual Number of Views
Basketball 23
Football 51
Golf 5
Soccer 64

Looking at the table, we see four rows corresponding to four data entries, with the first column giving the name of a sport and the second column saying what the annual viewership was.

There are many ways to represent this sort of data. It might be given to us in a file, or appear on a webpage, or some other raw form. For now, suppose that the table would be given to us as a list of entries, and that each entry would itself be another list. In this case, each entry list contains the name of the sport and its associated annual number of views. Thus, the above table would be given to us in the following form:

```python
sports_table = [entry1, entry2, entry3, entry4]
```
where
```python
entry1 = ["Basketball", 23]
entry2 = ["Football", 51]
entry3 = ["Golf", 5]
entry4 = ["Soccer", 64]
```

At this point it is helpful to write down a few examples of input data. How can different inputs vary, and in what ways does the difference matter?

Step 2: What queries do you need to answer? I.e., what shapes of data do you need to produce?

We are asked to identify which sports have more than a certain number of views. We weren’t told exactly how to deliver the results, but a list of sports names might make sense.

In general, there might be many different queries to answer. Here we just have one.

Like with the input data, it can be helpful to record an example of corresponding output data. E.g.,

result = ["Football", "Soccer"]

might be an output for particular threshold on the above input.

Step 3: Write a specification for the function

In words, not code, state the relationship between the function’s input and output. For example, you might write the following for our example function:

def find_sports(sports_table, threshold):
    """
    Input: 
      - sports_table: a list of data entries, where each data entry is a 2-element
        list containing a sport and its associated annual number of views
      - threshold: an integer representing the number of annual views that a
        particular sport must exceed for it to be included in the output
    Output: 
      - A list containing sports with annual number of views that each exceed
        the threshold
    """

Putting a multi-line string like this at the start of a Python function is standard, and often called a docstring for the function. We might now also add some type hints to the function header, to help us keep track of what to expect:

def find_sports(sports_table: list, threshold: int) -> list:
  ...

Step 4: What assumptions does the function need to make about the data?

We often make assumptions when we sketch a function. For instance, we’ve probably assumed so far that the value of threshold is positive. If this is important for the operation of the function, we should explicitly check.

def find_sports(sports_table, threshold):
    """
    Input: 
      - sports_table: a list of data entries, where each data entry is a list
        containing a sport and its associated annual number of views
      - threshold: an integer representing the number of annual views that a
        particular sport must exceed for it to be included in the output
    Output: 
      - A list containing sports with annual number of views that each exceed
        the threshold
      - If threshold is negative, throws a ValueError.
    """
    if threshold < 0:
        raise ValueError("ERROR: threshold must be non-negative.")

Python’s ValueError nicely expresses the problem of a bad input value. Notice that we’ve added a new line to the specification, also, to communicate to anyone using this function what to expect if they give an invalid threshold.

There may be other unstated assumptions here, too, but this check illustrates the idea.

Step 5: Additional Data Structures

Consider whether your queries might be better answered if, in advance, you built up some helper data structures. When doing this, consider whether the cost of building the helper data is worth the expected gain.

Here’s an example. We could pre-compute, for any given threshold value, the list of sports our function should output. But without knowledge in advance of which thresholds we likely to see, it would be challenging to build this data in advance—at least, in a cost-effective way.

We’ll see some examples of good helper data structures in class soon.

Step 6: Code (but not all at once)

It’s OK (and normal) to not immediately know how to tackle a problem. Instead, start from the top down, and slowly work your way toward a solution.

In this case, we know that we should return a list, and so it’s natural to create one as a helper data structure. We also need to check every row in the sports table we’ve been given, so we’ll add a for loop.

def find_sports(sports_table: list, threshold: int) -> list:
    """
    Input: 
      - sports_table: a list of data entries, where each data entry is a list
        containing a sport and its associated annual number of views
      - threshold: an integer representing the number of annual views that a
        particular sport must exceed for it to be included in the output
    Output: 
      - A list containing sports with annual number of views that each exceed
        the threshold
      - If threshold is negative, throws a ValueError.
    """
    if threshold < 0:
        raise ValueError("ERROR: threshold must be non-negative.")

    sports = []  # we need to return a list

    for entry in sports_table:
        # ??? what next?
        print(entry) # let's at least look at the data

    return sports # return the list

Approach the complete solution gradually, without worrying about knowing how to do every little thing all at once. Eventually, you’ll get there:

def find_sports(sports_table: list, threshold: int) -> list:
    """
    Input: 
      - sports_table: a list of data entries, where each data entry is a list
        containing a sport and its associated annual number of views
      - threshold: an integer representing the number of annual views that a
        particular sport must exceed for it to be included in the output
    Output: 
      - A list containing sports with annual number of views that each exceed
        the threshold
      - If threshold is negative, throws a ValueError.
    """
    if threshold < 0:
        raise ValueError("ERROR: threshold must be non-negative.")

    sports = []  # we need to return a list

    for entry in sports_table:
        sport = entry[0]
        views = entry[1]
            if views > threshold:
                sports.append(sport)

    return sports 

Step 7: Write tests, and run them

How do you know your function works? Only if you test it out! Fortunately, in step 1 you probably wrote some input and output data that naturally translate into test cases. But you probably need to write a few more. Like some other things, we’ll take more about this in lecture.