Linked Lists
Another Use for Objects
So far, we’ve mostly used objects to represent real-world entities (animals etc.). We can also use objects to implement data structures.
Let’s say we want a data structure that has the following operations:
append(value)
: stores a value;nth(n)
: retrieve the n-th value stored; andremove(n)
: remove the n-th value stored (and move any later elements up an index).
These operations should sound familiar: they are the core operations on lists. We saw a similar type in Pyret, too! The essential operations are the same. But what about the underlying implementation?
Python Lists
Python’s lists are implemented using contiguous chunks of memory. We took advantage of this to create hash tables earlier in the semester.
Pyret Lists
Pyret’s lists are different: they are constructed as a linked structure, where each link in the list contains a value and a reference to the next link. For reference, they’re defined more or less like this, as a datatype with two variants:
data List:
| empty
| link(fst, rst :: List)
end
Here’s two pictures illustrating the difference.
Notice that the Pyret list is a bit more complicated. It’s constructed from links that might be scattered around in memory, and those links are connected via references to each link’s successor (or empty
, which means the list has ended).
I’ve used two colors to help contrast two different ways you’ll see these lists drawn. In reality, the arrows from link to link (shown in the black-colored list) are references to memory addresses (made explicit in the orange-colored list).
This style of list is called a Linked List (because of how it’s structured like links in a chain).
So what’s the difference?
Both of these low-level data structures can provide all the features of a “list”: adding, removing, accessing the $n^{th}$ element, and so on. But how long does each operation take? E.g.,
- Adding a new element to the end of the list:
- Python lists can add new elements to the end of the list in constant ($O(1)$) time, provided that there is unused room in the available block of memory.
- Pyret lists (at least as shown above, where the list reference just points to the first link) need to find the end of the list before they can add a new element, meaning they need linear ($O(n)$) time to add a new element to the end.
- We could imagine an optimization to Pyret lists, where we store a reference to both the front and back of the list, which would let us add new elements in constant time.
- Finding the element at an arbitrary index $k$ into the list:
- Python lists can find this element in constant time.
- Pyret lists need to count forward $k$ elements.
What are linked lists good for, then? That is, what are their advantages over the contiguous array implementation?
Exercise: Think about various operations on lists. Which might linked lists (Pyret style) have an advantage on?
Think, then click!
Think about what happens if Python needs to insert a new element into the middle of a list. If the program is already “in the neighborhood” (that is, it knows the address of the element it will need to modify to add the new one) it doesn’t need to follow the chain of links. In this situation, a Pyret-style list can insert a new element in constant time.
Building Linked Lists in Python
Could we build something similar to Pyret’s lists in Python? Yes! Let’s write down some examples first. In Pyret we might write:
empty
link(1, empty)
link(1, link(2, empty))
Let’s make some classes that let us turn this into something we can write in Python. The picture above is a hint: we should probably have a kind of object to represent those links.
class ListNode:
def __init__(self, data):
self.data = data
self.next = None
Now we can write ListNode(1)
to represent the list storing the value 1
. But that’s not yet enough: we need to handle the first and third examples (the empty list, and a list with multiple elements).
We’ll do this by adding another class, one that represents an entire list in itself. We’ll give it one field, first
, to represent the first ListNode
it contains. When we first make a LinkedList
, it will be empty (represented by None
in the first
field).
class LinkedList:
def __init__(self):
self.first = None
I wonder…
How would the picture of the Pyret linked list above change to account for this new LinkedList
class? It’s not quite the same shape, and we’ll notice reasons why as we keep going.
What if…
Could we model linked lists more similarly to Pyret, with a ListNode
object that takes both the value and the next node? Yes! We’d write something like:
class ListNode:
def __init__(self, data, next):
self.data = data
self.next = next
Either option works. We’ll use the first version, however, because it will be useful later.
Updating our examples
Let’s translate those rough Pyret examples into Python code that creates them.
empty
would beLinkedList()
: just a list without anyListNodes
.link(1, empty)
would now be …
Oh bother. We aren’t done; we need some way of append
ing new values to an existing list. Right now, we can represent standalone nodes, but we can’t actually store them in the list without manually modifying the list’s fields (which is usually a bad idea).
Implementing append
(All code in this section is in methods inside the LinkedList
class; I’ve just left that out for readability.)
Let’s start with an empty method:
def append(self, data):
pass
We’ll proceed, as we have before, by making a skeleton of the method and adding details as we go. When append
is called with a new piece of data
to add, what do we have to work with? Just:
- the
data
value; and - the fields of the
LinkedList
being appended to, which the method callsself
.
Probably we’ll need to make a new ListNode
object to hold the new data value, so let’s do that now:
new_node = ListNode(data)
Now we need to add it into the list’s chain of nodes. The list gives us one field, self.first
, which points to the first node in the list. We could certainly change it:
self.first = new_node
Is that the right thing to do?
The Empty-List Case
Well, sometimes it is. If the list is empty, there is no first node yet, so we’re free to just make the new node the first node in the list. But if the list already contains nodes, we need to traverse the list until we find the end, and then add the new node there. So we’ll split the function, leaving the second case unfinished for now:
def append(self, data):
if not self.first:
self.first = ListNode(data)
else:
pass # ??? need to update LATER in the list
The Non-Empty List Case
Now we need to do 2 things:
- find the last node in the list; and
- add the new node as its successor.
Let’s solve these problems backwards. If we find the last node, making the new node its successor is straightforward:
last_node.next = ListNode(data)
(Remember that the new node will have None
in its next
field.)
How can we get to the end of the list? We could use a loop, but this is also a great place for recursion, just like it was in Pyret. We can write a function that passes the obligation to add a new value down the list, until (eventually) that obligation can be met:
# internal helper method, not called from outside the class
# we'll use the double-underscore convention to label this as "private"
def __append_to(self, node: ListNode, data):
if not node.next:
node.next = ListNode(data)
else:
self.__append_to(node.next, data)
# this is the method that a caller would invoke
def append(self, data):
if not self.first:
self.fst = ListNode(data)
else:
self.__append_to(self.first, data)
And now we’ve written append
.
Or have we? We couldn’t write tests easily before, but now we can. And next time we’ll have other methods that we can use to make our tests even better.
(I suggest looking at how the append
method works in the VSCode debugger; it will help you see the sequence of steps that Python is taking to add elements to a progressively longer list.)