Polymorphism

2022 Livecode (pre-class, for Tim’s prep)

Motivation

What type does the following function return?

def add(x, y):
  return x + y

Depending on the type of x and y, x+y could produce an integer, a float, a string, or something else. Python has different behavior depending on what we’re trying to add with +. Here are some examples:

1 + 2 produces 3, and type(3) is int;
'hello ' + 'world' produces 'hello world' and type('hello world') is str;
1.2 + 3.4 produces 4.6 and type(4.6) is float; and
[1,2,3] + [4,5,6] produces [1, 2, 3, 4, 5, 6], and type([1, 2, 3, 4, 5, 6]) is list.

Python will let us add values with different types if they are compatable:

1 + 1.1 (int and float) produces 2.1, which is a float; but
1 + '1' (int and string) produces a TypeError: TypeError: unsupported operand type(s) for +: 'int' and 'str'.

This behavior is convenient: it keeps us from having to use special type-preserving or type-modifying versions of +. It’s also powerful for other reasons, which is why we’re covering it today.

But it can also be confusing for someone trying to read or debug our code. If we only plan on calling our example add function with a specific type, we might want to add annotations:

def add(x: str, y: str) -> str:
  return x + y

Polymorphism Defined

Polymorphism (“poly-“: multiple, “morph-“: shape) is the ability to write one function or operator that works for many different data types. The + operator, as well as the built-in len function, str function, and many others are polymorphic. But Python also lets us write our own polymorphic functions, like add(x, y) above.

Polymorphism In Practice

Let’s add a __len__ method to the DJData class. This is what Python looks for in an object when someone calls the built-in len function on that object. The double-underscore convention isn’t really important right now—it’s used to label methods that shouldn’t accessed directly, and usually interact with some Python built-in, like len. E.g., if you created a DJData object called data, you’d use len(data) to get the length of its queue, and not data.__len__().

To do this, we’ll just add:

  # Here's how to provide the length of the object, as if it were a list...
  def __len__(self):
    return len(self.queue)

Notice that we can call len on many different things. Lists. Dictionaries. And now, DJData objects! Python lets each object define its length; Python doesn’t need to care about the details itself. This is the power of polymorphism.

Example: Polymorphism at the Library

Let’s revisit an example you might have seen in 0111: tracking items at a library. The library has different types of media: books, movies, and so on. Because these media are different, they may have different kinds of data: a movie might have a list of actors, but a book wouldn’t. So we’ll make separate classes for all of them. For simplicity, I’ll leave out a lot of fields we might have in reality:

class Book:
    def __init__(self, title: str, author: str):
        self.title = title
        self.author = author

class TVSeries:
    def __init__(self, title: str, num_episodes: int, actors: list):
        self.title = title
        self.num_episodes = num_episodes
        self.actors = actors

We can represent a library as a list of these items. Let’s pick one of each:

library = [
    Book("The Calculating Stars", "Mary Robinette Kowal"),    
    TVSeries("Guardian", 40, ["Bai Yu", "Zhu Yilong"])
]

Once we have a library, it’s natural to want to be able to search it. An advanced search might allow users to enter a string to search for, and for that to apply to all media—authors, directors, titles, etc. We could implement this by making books and movies the same datatype, and disambiguate using a field in that type. We’d probably need to use something like a dictionary to store arbitrary data fields, and write a lot of code to check that those fields were what we expected. That’s a lot of work, and not as true to the nature of these objects: movies have directors, books have authors, and so on.

Polymorphism is useful here. We’ll add a matches method to each class:

class Book:
    ...
    def matches(self, query: str) -> bool:
        return query in self.title or query in self.author

class TVSeries:
    ... 
    def matches(self, query: str) -> bool:
        return query in self.title or
          any([query in actor for actor in self.actors])

Notice that I’ve used a function called any here, which you might not have seen before. It takes in a list of booleans and produces true if and only if any of those inputs is true. It’s convenient for implementing this sort of method via a list comprehension.

Now we can implement our library search function, which works with lists of these objects:

def search(library: list, query: str) -> list:
    return [item for item in library if item.matches(query)]

Let’s work though what happens when we call search(library, "Zhu"). Here’s what the state of the program dictionary and memory look like:

The list comprehension in search is going to look through everything in library: the book at index 0 and the movie at index 1. In both cases, it will call the matches method of the object at hand, but let’s be a bit more concrete about what each of those calls has to work with.

When the program is executing the matches method of the book, the program dictionary is expanded: self and query are known now, too! The program dictionary has 3 identifiers in it.

Once the program returns from the search function, those entries get removed. And every time a search method is called, the value of self changes.

Preview

We’ll come back to polymorphism on Friday, and you’ll see some of the ways that Python uses it.

Project 2

Your next project is a very basic climate-change simulator.

Real climate simulation is challenging: there are many factors, both technical and social, that make modeling the problem complex. A very high quality simulation is therefore outside the scope of 112. But, much like we wanted you to get some practice with the ideas of machine learning, we also want you to get experience with building simulations.

When building this assignment, we consulted with Baylor Fox-Kemper, an environmental sciences professor at Brown, to produce something that, in spite of its simplicity, was still based in real climate science. You can find out more about this in the resources linked from the handout. (We’re grateful to Professor Fox-Kemper for taking the time to talk with us!)

The simulation focuses on a geopolitical challenge. Different nations might implement different climage-change mitigation strategies: keeping emissions constant, scaling them back in specific ways, and so on.

Exercise: what other kinds of strategies like these can you think of? Do they all use the same data?

You’ll be representing each of these strategies as its own class that informs the simulation via an emit method. Defining that method for each strategy class (What arguments does it take?) will be good practice at using both classes and polymorphism in Python. The similator shouldn’t need to know what kind of strategy class it’s working with; all it needs to know is that an emit method exists, and what to pass to it.

So, in spite of the simplistic nature of the model, this project will give you valuable experience with:

basic (but real) climate science in a computing context;
a new application for computing—simulation; and
object-oriented programming in Python.

You’re allowed to partner with the same person on Project 2 if you’d like to, although we encourage branching out!

Note on Testing Floats

Here’s a quick example of what can go wrong when you test non-whole numbers in Python (and many other languages). Suppose we want to assert that the base-10 logarithm of 2 is equal to the value that you get when you Google for log of 2:

assert log10(2) == 0.30102999566

This fails! But why? Let’s print out what log10(2) is in Python:

0.3010299956639812

Those are indeed two different numbers, although the difference is quite small. Google and Python returned the value with different precision—with a different number of decimal digits. Other, related issues can occur when you’re testing non-whole number values. For example, error can propagate and be amplified—recall why we use significant digits in science!

In general, if you can write your expected output values in ways that don’t decimal-expand, you should. In Project 1, you may have written your weights as logarithms of constants, rather than decimals to avoid this problem. The Pytest library also provides a useful helper, here:

from pytest import approx
assert log10(2) == approx(0.30102999566)

The default relative tolerance of approx is 1e-6 (one part per million). See the docs for more info.