Testing Classes, Inheritance
Some quick notes:
- A “field” of a class (also called an attribute) is a variable in the scope of the class. For instance, the
author
andtitle
variables were fields ofBook
last time. - Fields can be either class-specific or instance-specific. That is, you might have one variable that is shared across all instances of a class, and another that is different for every instance (like the
title
of a book).
To tell whether a field is class-specific or instance-specific, just see if it’s defined in the __init__
method (instance-specific) or outside the __init__
methods (class-specific).
Another Example of Polymorphism: str and repr
Python uses polymorphism ubiquitously. For example, recall that dataclass
instances printed out in a nice way. How can we get similar behavior with our own classes? We’ve seen __str__
already, but it turns out there’s some nuance here: there’s also a __repr__
method.
class Book:
# "human readable" string representation
def __str__(self):
return self.title + " by " + self.author
# "unambiguous" string representation
def __repr__(self):
return 'Book("' + self.title + '", "' + self.author + '")'
These two methods sound similar, but they get called in different places internally by Python. By convention, __str__
is for end-user focused output, and you want it to be human readable. In contrast, __repr__
should be precise and unambiguous (so I wrote it to produce a string that is, itself, Python code).
If I run print(library[0])
, it prints the book’s title, like: A Book Title by A Book Author
. But if I’m in the Python console and I just type library[0]
, it will produce
Book("A Book Title", "A Book Author")
.
Note that this only affects the way Python displays the object; it doesn’t change anything about the object’s internal representation. It’s generally good practice to define at least the __repr__
method in all of your classes. It makes debugging a lot easier when you get an understandable string instead of an object type and ID!
Methods that use the double-underscore notation are usually understood to be special, and interact with some internal Python functionality. By convention we don’t call them directly, but via syntactic sugar they provide. We don’t invoke __init__
to make a new object, we use the ClassName(...)
syntax instead. Likewise, we’ll use (e.g.) repr(an_object)
and str(an_object)
.
If you don’t define a __str__
, Python will fall back to using __repr__
instead. And some kinds of collections, like lists, may invoke either of these if you call str
or repr
on them.
Python classes have quite a few of these standard method names. Another is __eq__
, which lets you define how an object should decide whether it’s equal to another. Let’s talk about that, but from the perspective of testing.
Testing Classes
How should we test our library program? We should write tests for each of the methods. Let’s test the Book
class:
from library import *
def test_init():
b = Book("The Nickel Boys", "Colson Whitehead")
assert b.title == "The Nickel Boys"
assert b.author == "Colson Whitehead"
def test_matches():
b = Book("The Nickel Boys", "Colson Whitehead")
assert b.matches("Nickel")
assert not b.matches("Parasite")
assert b.matches("Colson")
Notice the pattern here: we’re creating objects in our test functions in order to test the objects’ methods.
Equality
It’s worth noting, again, that equality is complicated. It’s also used in many different places. For example:
library = [
Book("The Calculating Stars", "Mary Robinette Kowal"),
TVSeries("Guardian", 40, ["Bai Yu", "Zhu Yilong"])
]
print(Book("The Calculating Stars", "Mary Robinette Kowal") in library)
What does this produce? We might expect True
(because the two books are identical) or False
(because the two books are different objects). When we voted on this question earlier in the semester for lists, we discovered that two different list objects that contain the same entries in the same order are equal (==
) in Python. So why is this producing something different from what we’d get with lists?
The problem is that Python has no idea which fields are important to us when deciding equality. Dataclasses automatically define equality to use all fields, but here we need to tell Python exactly what it means for two books to be the same. To tell Python how to interpret equality for Book
objects, we’ll define the __eq__
method:
def __eq__(self, other):
return self.author == other.author and self.title == other.title
Now the above print statement produces True
like we’d expect.
Important note: There’s more to this story that we’ll get to later in the semester. For now, defining equality like this should be OK. Pay special attention to equality if you use your objects as keys in data structures that depend on ideas related to equality. If you use objects as keys in something like a dictionary, which uses the hash table idea, you’ll also need to define the __hash__
method, which says how to generate a hash for the object. When we built hash tables in class, we just treated keys as integers and took the remainder of the key after dividing it by the table size. But for arbitrary objects, we need to describe how to turn their field values into a single number.
Adding checkout
Let’s say we want to add “checkout” behavior to our library: we want to be able to record that certain items have been checked out or returned, and take this into account when searching. Let’s add this behavior to our tests first:
def test_init():
b = Book("The Nickel Boys", "Colson Whitehead")
assert b.title == "The Nickel Boys"
assert b.author == "Colson Whitehead"
assert not b.checked_out
def test_checkout():
b = Book("The Nickel Boys", "Colson Whitehead")
b.library_checkout()
assert b.checked_out
def test_return():
b = Book("The Nickel Boys", "Colson Whitehead")
b.library_checkout()
b.library_return()
assert not b.checked_out
def test_matches():
b = Book("The Nickel Boys", "Colson Whitehead")
assert b.matches("Nickel")
assert not b.matches("Parasite")
assert b.matches("Colson")
b.library_checkout()
assert b.matches("Nickel")
assert not b.matches("Parasite")
assert b.matches("Colson")
Then we can add the behavior to our Book class:
class Book:
def __init__(self, title: str, author: str):
self.title = title
self.author = author
self.checked_out = False
def library_checkout(self):
self.checked_out = True
def library_return(self):
self.checked_out = False
def matches(self, query: str) -> bool:
return (not self.checked_out) and (query in self.title or query in self.author)
Note that this is a design choice: we could put the check in the class, or we could put the check in the search
function. I’d argue that putting the method here, in the class, is actually less preferable: what if a librarian needs to search the library for books, so that they can track down what’s missing? Better to allow the library functions to make that distinction, and not force it in the matches
method.
So, if I were writing this again, I’d design it differently.
Inheritance
Right now, we’re considering each class to be totally separate, with no shared code or data. For instance, even though Book
s and Movie
s both implement a matches
method, the two methods have completely different implementations. Inheritance gives us a way to share code between classes. Classes can inherit from other classes, like this:
class A:
pass # fill in "parent"
class B(A): # note the parameter (A)
pass # fill in "child"
When there’s an inheritance relationship like this between two classes, we say that A
is the superclass of B
and B
is a subclass of A
.
Let’s see an example of inheritance in action. Suppose that our library wants to track which items have been checked out, and only return items in a search if they are actually available. First, we’ll implement a LibraryItem
class, which we’ll use as the superclass for both books and movies:
class LibraryItem:
def __init__(self):
self.checked_out = False
def library_checkout(self):
self.checked_out = True
def library_return(self):
self.checked_out = False
The LibraryItem
class only handles checking items out and back in; it doesn’t know anything about what the items actually are.
In order to use our new parent class, we’ll make some changes to the Book
and Movie
classes:
class Book(LibraryItem):
def __init__(self, title: str, author: str):
self.title = title
self.author = author
super().__init__()
class Movie(LibraryItem):
def __init__(self, title: str, director: str, actors: list):
self.title = title
self.director = director
self.actors = actors
super().__init__()
Note well: this is one of the few places where the convention above is conventionally violated: we call __init__
directly here, from within a subclass’s __init__
. This is because we’re in the object already; if we called the constructor for the class we’re inheriting, Python would create a different object. We want to finish initializing this one.
Both Book
and Movie
objects will now automatically have a checked_out
field, as well as the checkout method. We can use the field in each class’s matches method:
class Book(LibraryItem):
# other methods elided...
def matches(self, query: str) -> bool:
return (not self.checked_out) and (query in self.title or query in self.author)
class Movie(LibraryItem):
# other methods elided...
def matches(self, query: str) -> bool:
return (not self.checked_out) and
query in self.title or query in self.director or
any([query in actor for actor in self.actors])
…although, again, if I were re-writing these, I’d probably put the check in the search
method. Better yet, I’d probably create a Library
class to handle the idea of checking books in and out, rather than making the actual item keep track itself.
Inheritance and Exceptions
This is how you define your own kinds of exceptions. You just say something like this:
class LibraryError(Exception):
...
Since LibraryError
extends Exception
, it’s possible to raise
one in your code.