Expression Trees

Materials:

Motivating Expression Trees

We talked briefly before about how tree-like data pops up all over the place in computing and in life. Today we’ll look at one more type of tree. Here’s an example:

What do you notice about this tree?

What might this tree represent? Arithmetic! The arithmetic expression ((9*4) + 5) to be precise. The children of an operation node correspond to that operation’s inputs. We’ll read these inputs from left to right (although + and * are commutative, so for the moment I’ll disregard ordering).

Both of these representations, the tree and the expression, are equally valid. For computation, though, the tree (we’ll call it an expression tree) has advantages.

Here are two more perfectly good trees, representing 4 and (5 + ((4 * 2) - 7 )) respectively:

Notice how the parentheses in the expressions echo the “levels” of the corresponding expression tree.

What would an invalid tree look like? Maybe something like:

What about

which represents a division by zero: (10 / 0)? Here something is wrong, but is it the same kind of problem? We can explore the difference in Python.

If I write 10 / 0 in Python, the program runs and I get a ZeroDivisionError. But if I write 1 2 3 in Python I get a SyntaxError: the program doesn’t run, because Python can’t infer its meaning.

Implementing Expression Trees

There are a lot of things we might want to do with an expression tree. Three of the most common are:

String to tree: parsing

Parsing is out of scope for 112, and is usually far harder than writing __repr__. (This is why we gave you a parser for HTML trees.)

Why is parsing hard? Lots of reasons, but consider producing a tree for the expression string 1 + 2 * 3. Without knowing the rules of precedence for arithmetic, this expression is ambiguous: there are multiple potential trees it could correspond to ((1+2)*3 and 1+(2*3)), and those trees even produce different values!

If you’re curious about how parsing works, check out CSCI 1260. For now, we’ll always start with an expression tree directly.

Tree to string: __repr__ (Part 1)

Before we can convert trees into strings, we need to implement trees themselves.

What do these arithmetic expressions look like as Python classes? There are multiple ways to represent them. Let’s be guided by a convenient design principle: invalid trees should, ideally, be impossible to create. For instance, we’d like to make it impossible to create a tree with 12 as an internal node, or + as a leaf. This is analogous to Python’s syntax error when given 1 2 3: there’s no expression tree to represent this because it’s not valid Python.

We’ll have different classes for different kinds of node. Here, we have two very different notions: operations, which can only be internal nodes of the tree, and values, which can only be leaf nodes.

class Value:
    def __init__(self, value):
        self.value = value
    
class Operation:
    def __init__(self, op:str, left, right):
        self.op = op
        self.left = left
        self.right = right

We could improve on this (e.g., by raising an error if an Operation gets created with an invalid operator) but let’s move forward for now.

Tree to string: __repr__ (Part 1)

Now let’s write __repr__ methods for each class. Actually, let’s do a bit more. In the past, we’ve seen that there are two different “convert to string” methods in Python, __repr__ (for the use of programmers) and __str__ (for the use of regular users). Let’s use this opportunity to demonstrate the different uses of __repr__ and __str__.

Values

A value is just a single value. If we’re representing it for our own debugging use, we should probably tag it as a Value class, and say what it contains. But if we’re representing it for a user, we should probably just give the value itself. We’ll do that like this:

    def __repr__(self):
        return f'Value({self.value})'
    def __str__(self):
        return str(self.value)

Operations

Operations are a bit more complex. Not only do we need to account for their left and right children, but we need to make sure that the recursive structure of the whole tree gets explored.

Fortunately, this isn’t tough to do; we’ll just use format strings and make sure to call either str or repr appropriately:

    def __repr__(self):
        return f'Operation("{self.op}", left={repr(self.left)}, right={repr(self.right})'
    def __str__(self):
        return f'({str(self.left)} {self.op} {str(self.right)})'

We’re calling repr and str explicitly to make sure the proper string-representation gets created for each child object. If we didn’t do this, Python would default to one of them; we need to disambiguate.

Testing

Let’s try some expressions and make sure these methods work like we expect:

Operation('+', Value(1), Value(2))
Operation('/', Operation('/', Value(2), Value(2)), Value(5))

As we do so, it’s worth asking: what value would we expect to be produced if we ran these expressions through a calculator program? (Probably we’d get 3 and 0.2, respectively.)

Running the program

One type of computation we might want to do with expression trees is run them. For instance, if our expressions are arithmetic, we might write a method that runs the arithmetic and returns the result, just like a calculator program. We call such a function an evaluator or interpreter.

The interpreter for Value turns out to be pretty straightforward: we just return the value that’s already there. But what about Operation? The key lies in realizing that we need to turn an operation node containing the string + into an actual addition operation (and likewise for the other operators):

    def interp(self):
        if self.op == '+':
            return self.left.interp() + self.right.interp()
        if self.op == '*':
            return self.left.interp() * self.right.interp()
        if self.op == '-':
            return self.left.interp() - self.right.interp()
        if self.op == '/':
            return self.left.interp() / self.right.interp()
        raise ValueError(self.op) # built-in error type

If you’ve ever heard someone say that the Python prompt “interprets” Python or “evaluates” Python, now you know why.

Isn’t there a lot more to Python than just arithmetic?

Yes! You’ll learn to write more sophisticated interpreters if you take Programming Languages (CSCI 1730). Here’s an exercise: if you needed to add variables to our language, how would you do it? There are two challenges:

To solve the first challenge, we’ll add a new class:

class Variable:
    def __init__(self, name):
        self.name = name
    def __str__(self): 
        return f'{self.name}'
    def __repr__(self):
        return f'Variable({"self.name"})'
    def interp(self):
        return ???

What’s the result of interpreting a variable? We don’t know unless we’re told its value! So let’s make that explicit by adding a new parameter to interp. It will be a dictionary that maps variable names to values:

    def interp(self, variable_values: dict):
        return variable_values[self.name]

Now what? We need to expand the interp method for the other classes, and make sure the dictionary is passed whenever we call any of the interp methods. But then everything should work nicely.

Notice what’s happened here. We’ve just made the language more general: the caller now gets to use variables, and pass in specific values. This is one way we can increase the expressive power of a language.

Testing

Suppose I write:

expr1 = Operation('+', Value(1), Value(2))
expr2 = Operation('/', Operation('/', Value(2), Value(2)), Value(5))

assert expr1.interp() == 3
assert expr2.interp() == 0.2

Exercise: What tests am I missing?