More trees, introduction to objects

Because of the long weekend, we’re giving you an extra day on Homework 3. The next assignment (project 2) will go out as usual, but as it is a 2-week assignment we think the 1-day overlap will be more helpful for you than not.

Today’s notes contain another recursion example that wasn’t discussed in class but might be useful to you. Please read it! In class, we ran through the all_text function from last time (see live/lecture13.py), and introduced objects in Python.

Livecode

Comments on the drill

Recall this example from Monday’s class:

In this visualization of the tree, every HTMLTree object is a node, and the edges between nodes represent the contents of the list in each node’s children field. A leaf of the tree is a node without any children (here, the two text nodes).

It’s both common and reasonable to ask: “Why did you name the dataclass HTMLTree, when it only contains the data for one node of the tree?” It’s true that other names would have worked. We chose to include the word “Tree” in the name to reinforce the idea of delegating work for each child tree to a new computation for each child object.

What else is tree-shaped?

Tree-shaped data appears in more than just HTML documents. In fact, you’ve been working with tree-shaped data all semester without thinking much about it.

What else, besides HTML, is tree-shaped?

Think, then click!

Elm trees, oak trees, …

Ok, no, seriously: here are three tree-shaped formats you may be familiar with:

  • taxonomies, org-charts, biological parentage, etc.;
  • algebraic expressions;
  • Python (and Pyret, and Java, and…) code; and
  • the English language.

Expressions you might enter into your calculator are all tree-shaped. Consider something like (1+2)*(17/4). The operators are intermediate nodes in the tree, and the numbers are the leaf nodes at the bottom. The same goes for more complicated algebraic expressions.

Every time you run a Python program, the python3 executable first turns your program text into something called an abstract syntax tree, which it then tries to optimize and run. You can look at any of the programs you’ve written, and see if you can break it up into a tree: functions and classes at the top level, etc. In fact, Python’s indentation reinforces the idea of the syntax tree.

If you’ve ever diagrammed a sentence in English class, you’ve done manual computation over tree-shaped data.

Example 3

Let’s build one more function on trees

Challenge

Write a function replace_text that:

This function is a little bit more complicated than those we wrote on Wednesday, so let’s definitely write some example tests first. And while we do so, it’s worth asking: what might somebody writing this function get wrong?

Think, then click!

Here are some starter tests I came up with:

def test_replace_text():
  # Wrapping long lines outside parens: 
  #   backslash (watch out for any blank space after the backslash)   
  
  # don't replace in tags
  assert replace_text(parse('<html></html>'), 'html', 'foo') == \
         parse('<html></html>')   
  # replace at nested depths
  assert replace_text(parse('<html><p>hello</p><p><strong>world</strong></p></html>'), 'hello', 'greetings') == \
         parse('<html><p>greetings</p><p><strong>world</strong></p></html>')
  assert replace_text(parse('<html><p>greetings</p><p><strong>world</strong></p></html>'), 'world', 'fellow students') == \
         parse('<html><p>greetings</p><p><strong>fellow students</strong></p></html>')
  # replace partial words
  #   (you don't NEED the backslash, but I find having it on one line is less readable)
  assert replace_text(parse('<html>chocolate cake</html>'), 'chocolate', 'dog-safe') == parse('<html>dog-safe cake</html>')   

We still can start in the same way as before: first we’ll solve the “base case” (where no recursion is needed) and then the recursive case. So: if we only had one node in the tree, what would the function look like?

Probably something like this:

def replace_text(tree: HTMLTree, find: str, replace: str):
    if tree.tag == 'text':
        if tree.text == find:
            tree.text = replace        

But that’s not entirely right. One of the above tests is failing (why?)

Think, then click!

That function would only work if the entire text field were the search string. It fails on our “chocolate cake” to “dog-safe cake” example. Instead, let’s use a built-in Python function:

def replace_text(tree: HTMLTree, find: str, replace: str):
    if tree.tag == 'text':
        tree.text = tree.text.replace(find, replace)

Where should we go from here? Just like before, we need to handle the children somehow. And since we don’t know how deep the tree will go, we’ll use recursion.

def replace_text(tree: HTMLTree, find: str, replace: str):
    for child in tree.children:
        replace_text(child, find, replace)
    if tree.tag == "text":
        tree.text = tree.text.replace(find, replace)

What does this function return?

Think, then click!

Just None. We’ve built this function to modify the existing tree, rather than manufacturing any new objects. We can do this because HTMLTree isn’t frozen, but as a result we won’t be able to use HTMLTree as a key in sets or dictionaries.

I wonder…

How would we change this function so that, instead of changing the input tree, it returned an entirely new tree structure with the needed changes?

Introduction to Objects

Imagine you’re a DJ at a radio station. You only play songs that your listeners call in and request. In addition, every thousandth listener who calls in gets a prize! You want to keep track of the queue of songs you’ve been asked to play, as well as enough information to give out prizes.

To do this, we need a list of songs and a counter that says how many callers we’ve had so far. (Why can’t we get by with just the list?)

We could implement this with a custom data type like this:

@dataclass
class DJData:
  num_callers: int
  queue: list

We can implement a function to update our data and to figure out what we’re going to say to a listener:

def request(data: DJData, caller: str, song: str) -> str:
  data.queue.append(song)
  data.num_callers += 1
  if data.num_callers % 1000 == 0:
    return "Congrats, " + caller + "! You get a prize!"
  else:
    return "Cool, " + caller

So here we’ve got a datatype and a function that reads and modifies that datatype’s contents. We can see how it works:

the_dj = DJData(0, [])
request(the_dj, "Tim", "French Fries w/Pepper")
"Cool, Tim"

We could have written this slightly differently:

@dataclass
class DJData:
  num_callers: int
  queue: list

  # This is now *part* of the DJData class; a "method" of the class
  # The convention is that the object being operated on is the first argument (self)
  def request(self, caller: str, song: str) -> str:
    self.queue.append(song)
    self.num_callers += 1
    if self.num_callers % 1000 == 0:
      return "Congrats, " + caller + "! You get a prize!"
    else:
      return "Cool, " + caller

To do this, we put the request function inside the definition of DJData. We’ve also modified the method a bit: instead of taking a data argument, we’ve called the argument self and left off the type annotation. This function is now a method on the DJData class. Which means we can call it like we’ve been calling methods of other things like lists or dictionaries:

the_dj = DJData(0, [])
the_dj.request("Tim", "Whirling-In-Rags, 8PM")
"Cool, Tim"

We call methods by writing the name of an object (the_dj, in this case), then a dot, then the method arguments, excluding self. Since we’re not passing self in, how does Python know which object to call the method on?

We’ll keep learning about classes, objects and methods. I want to re-emphasize, though, that you’ve seen this before. We’ve called methods on lists, sets, etc. For instance, l.append(2). What we’re seeing now is how to add methods to our custom objects!

Note on @dataclass

Remember that @dataclass tells Python to help us by automatically defining some useful methods. For example, @dataclass will automatically tell Python how to display one of these objects as a string (the str method, which we’ll cover next week).

The __init__ method

Up until now we’ve been using the dataclasses library to make our custom datatypes work more like Pyret’s. But from this point on, we will generally not use it unless we need to, so that we can see how Python’s objects work.

One of the features that @dataclass gives us is easy initialization: e.g., because HTMLTree is a dataclass, we can easily create HTMLTree objects by writing HTMLTree(...) where the ... gives a tag, a child list, and an optional text field. The HTMLField function is called a constructor or an initializer.

Non-dataclass objects need you to define their constructor yourself! To do this, we’ll define __init__() methods to initialize data on objects:

# Notice we took away the @dataclass annotation
class DJData:
    def __init__(self):
        self.queue = []
        self.num_callers = 0

    def request(self, caller: str, song: str) -> str:
        self.queue.append(song)
        self.num_callers += 1
        if self.num_callers % 1000 == 0:
            return "Congrats, " + caller + "! You get a prize!"
        else:
            return "Cool, " + caller

Python calls the __init__ method in order to initialize an object’s fields when it is created. We can construct instances of the DJData class like this:

> the_dj = DJData()
> the_dj.request("Tim", "Paper Bag")
"Cool, Tim"

Note that this gives us a bit more control over how fields are initialized. We could do anything we’d like inside __init()__: print things, run algorithms automatically, send email messages, etc.

…but, we’ve also lost the help that @dataclass was giving us. If we print out the_dj now, we won’t get a nicely formatted string anymore. It’s now going to be our responsibility to fill in such methods.