These notes do not exactly match the lecture capture, because we merged some OOP and recursive functions content.
More trees, introduction to objects
Today’s notes contain another recursion example that wasn’t discussed in class but might be useful to you. Please read it! In class, we ran through the all_text
function from last time (see live/lecture13.py), and introduced objects in Python.
Comments on the drill
Recall this example from Monday’s class:
In this visualization of the tree, every HTMLTree
object is a node, and the edges between nodes represent the contents of the list in each node’s children field. A leaf of the tree is a node without any children (here, the two text nodes).
It’s both common and reasonable to ask: “Why did you name the dataclass HTMLTree
, when it only contains the data for one node of the tree?” It’s true that other names would have worked. We chose to include the word “Tree
” in the name to reinforce the idea of delegating work for each child tree to a new computation for each child object.
Example 3
Let’s build one more function on trees
Challenge
Write a function replace_text
that:
- accepts as input: an
HTMLTree
, a text string to search for, and a replacement string; and - changes the input tree so that instances of the search string have been replaced with the replacement string.
This function is a little bit more complicated than those we wrote on Wednesday, so let’s definitely write some example tests first. And while we do so, it’s worth asking: what might somebody writing this function get wrong?
Think, then click!
Here are some starter tests I came up with:
def test_replace_text():
# Wrapping long lines outside parens:
# backslash (watch out for any blank space after the backslash)
# don't replace in tags
assert replace_text(parse('<html></html>'), 'html', 'foo') == \
parse('<html></html>')
# replace at nested depths
assert replace_text(parse('<html><p>hello</p><p><strong>world</strong></p></html>'), 'hello', 'greetings') == \
parse('<html><p>greetings</p><p><strong>world</strong></p></html>')
assert replace_text(parse('<html><p>greetings</p><p><strong>world</strong></p></html>'), 'world', 'fellow students') == \
parse('<html><p>greetings</p><p><strong>fellow students</strong></p></html>')
# replace partial words
# (you don't NEED the backslash, but I find having it on one line is less readable)
assert replace_text(parse('<html>chocolate cake</html>'), 'chocolate', 'dog-safe') == parse('<html>dog-safe cake</html>')
We still can start in the same way as before: first we’ll solve the “base case” (where no recursion is needed) and then the recursive case. So: if we only had one node in the tree, what would the function look like?
Probably something like this:
def replace_text(tree: HTMLTree, find: str, replace: str):
if tree.tag == 'text':
if tree.text == find:
tree.text = replace
But that’s not entirely right. One of the above tests is failing (why?)
Think, then click!
That function would only work if the entire text field were the search string. It fails on our “chocolate cake” to “dog-safe cake” example. Instead, let’s use a built-in Python function:
def replace_text(tree: HTMLTree, find: str, replace: str):
if tree.tag == 'text':
tree.text = tree.text.replace(find, replace)
Where should we go from here? Just like before, we need to handle the children somehow. And since we don’t know how deep the tree will go, we’ll use recursion.
def replace_text(tree: HTMLTree, find: str, replace: str):
for child in tree.children:
replace_text(child, find, replace)
if tree.tag == "text":
tree.text = tree.text.replace(find, replace)
What does this function return?
Think, then click!
Just None
. We’ve built this function to modify the existing tree, rather than manufacturing any new objects. We can do this because HTMLTree
isn’t frozen
, but as a result we won’t be able to use HTMLTree
as a key in sets or dictionaries.
I wonder…
How would we change this function so that, instead of changing the input tree, it returned an entirely new tree structure with the needed changes?
Introduction to Objects
Imagine you’re a DJ at a radio station. You only play songs that your listeners call in and request. In addition, every thousandth listener who calls in gets a prize! You want to keep track of the queue of songs you’ve been asked to play, as well as enough information to give out prizes.
To do this, we need a list of songs and a counter that says how many callers we’ve had so far. (Why can’t we get by with just the list?)
We could implement this with a custom data type like this:
@dataclass
class DJData:
num_callers: int
queue: list
We can implement a function to update our data and to figure out what we’re going to say to a listener:
def request(data: DJData, caller: str, song: str) -> str:
data.queue.append(song)
data.num_callers += 1
if data.num_callers % 1000 == 0:
return "Congrats, " + caller + "! You get a prize!"
else:
return "Cool, " + caller
So here we’ve got a datatype and a function that reads and modifies that datatype’s contents. We can see how it works:
the_dj = DJData(0, [])
request(the_dj, "Tim", "French Fries w/Pepper")
"Cool, Tim"
We could have written this slightly differently:
@dataclass
class DJData:
num_callers: int
queue: list
# This is now *part* of the DJData class; a "method" of the class
# The convention is that the object being operated on is the first argument (self)
def request(self, caller: str, song: str) -> str:
self.queue.append(song)
self.num_callers += 1
if self.num_callers % 1000 == 0:
return "Congrats, " + caller + "! You get a prize!"
else:
return "Cool, " + caller
To do this, we put the request function inside the definition of DJData
. We’ve also modified the method a bit: instead of taking a data argument, we’ve called the argument self
and left off the type annotation. This function is now a method on the DJData class. Which means we can call it like we’ve been calling methods of other things like lists or dictionaries:
the_dj = DJData(0, [])
the_dj.request("Tim", "Whirling-In-Rags, 8PM")
"Cool, Tim"
We call methods by writing the name of an object (the_dj
, in this case), then a dot, then the method arguments, excluding self
. Since we’re not passing self
in, how does Python know which object to call the method on?
We’ll keep learning about classes, objects and methods. I want to re-emphasize, though, that you’ve seen this before. We’ve called methods on lists, sets, etc. For instance, l.append(2)
. What we’re seeing now is how to add methods to our custom objects!
Note on @dataclass
Remember that @dataclass
tells Python to help us by automatically defining some useful methods. For example, @dataclass
will automatically tell Python how to display one of these objects as a string (the str
method, which we’ll cover next week).
The __init__
method
Up until now we’ve been using the dataclasses library to make our custom datatypes work more like Pyret’s. But from this point on, we will generally not use it unless we need to, so that we can see how Python’s objects work.
One of the features that @dataclass
gives us is easy initialization: e.g., because HTMLTree
is a dataclass, we can easily create HTMLTree
objects by writing HTMLTree(...)
where the ...
gives a tag, a child list, and an optional text field. The HTMLField
function is called a constructor or an initializer.
Non-dataclass objects need you to define their constructor yourself! To do this, we’ll define __init__()
methods to initialize data on objects:
# Notice we took away the @dataclass annotation
class DJData:
def __init__(self):
self.queue = []
self.num_callers = 0
def request(self, caller: str, song: str) -> str:
self.queue.append(song)
self.num_callers += 1
if self.num_callers % 1000 == 0:
return "Congrats, " + caller + "! You get a prize!"
else:
return "Cool, " + caller
Python calls the __init__
method in order to initialize an object’s fields when it is created. We can construct instances of the DJData class like this:
> the_dj = DJData()
> the_dj.request("Tim", "Paper Bag")
"Cool, Tim"
Note that this gives us a bit more control over how fields are initialized. We could do anything we’d like inside __init()__
: print things, run algorithms automatically, send email messages, etc.
…but, we’ve also lost the help that @dataclass
was giving us. If we print out the_dj
now, we won’t get a nicely formatted string anymore. It’s now going to be our responsibility to fill in such methods.