The next problem we will tackle is how to store binary trees in a file. As always, there are a vast number of different situations that might arise, but luckily in the case of binary trees, these can divided into two general cases- binary search trees, and unordered binary trees.
Imagine that we have an inorder binary tree consisting of integers- each node contains an integer value; the left subtree (if any) contains only keys less than the key of the node, and the right subtree (if any) contains keys that are greater. Imagine that we want to store this binary tree in a file in such a way that we can reconstruct it exactly as it was. In particular, our goal is to store the keys in such an order that when if we start with an empty tree, and we read them in again in the order they appear in the file and insert them into the tree, the resulting tree has exactly the same form as the original.
This problem may seem somewhat contrived, given the binary search trees that we have seen- the key properties of binary search trees (at least as far as we have discussed them so far) is that the keys are inorder, and (in order to minimize search time) the tree is reasonably balanced. Both of these properties can be attained by constructing the tree in a proper manner, regardless of the order that the keys are inserted into the tree. However, there are other kinds of binary search trees where there are other relationships between the nodes, and the exact shape of the tree actually has a great deal importance.
Your first thought might be to write a routine that walks through the tree using one of the traversals that we've seen in class (inorder or depth-first, preorder, postorder, or level-order), writing out the value of each node as you encounter it- and you'd be right, but only if you picked the correct traversal!
If we use an inorder traversal, we'll put the values onto the disk in a sorted order- and as we've seen, if we build a binary tree from sorted keys (assuming that no balancing operations are performed), then we will always get a tree that looks like a linked list. Unless by some incredible coincidence all the trees we want to store happen to look like this, the resulting trees will not be identical to the originals.
If we use a postorder traversal, it won't work either (although at least the trees won't always come out looking like linked lists). If we try a few different examples, we can see see that it just doesn't do what we need, but after some fooling around you might start to catch on to why neither inorder or postorder does the right thing- which leads us to a preorder or level-order traversal.
Recall that when we build a binary search tree using the standard insertion routines, we build from the top down- nodes are always added as leaves. Therefore, in order to reconstruct the original tree, the following property must hold- before a key is inserted into the tree, the key which was the key of its parent node in the original tree must already be in the tree. Preorder or level-order traversal will do what we need, since each always visits parents before their children, and so parents will always appear before their children in the file.
Since preorder is trivial to code and efficient (while level-order slightly more complex and may require more space), it makes sense to use preorder. It's interesting to note that there are a number of other traversals that share this property. However, since preorder is the easiest to code and understand, there is no reason not to use it.
For an example of an implementation, see figure 6.4.
Figure 6.4:
Routines to read and write binary search trees with integer keys.
This code is very simple- partly because it doesn't do much error checking or attempt to recover from any errors, but mostly because the algorithm is very short and simple. Note that the treeInsert function can be any ordinary tree insertion routine, but it must not be one that attempts to balance or otherwise modify the existing shape of the tree in any way other than creating a new leaf!
The code in figure 6.4 only reads and writes integer keys. It could be extended to work with binary search trees that use some other kind of key, and have values composed of arbitrary records of strings or other types, using the same sort of methods that we used in the previous section that described how to deal with strings.
Now that we've conquered binary search trees, let's tackle something a little more complicated- ``unordered'' binary trees, such as expression trees or animals trees (like those that we used in the animals program).
The crucial difference between this type of tree and a binary search tree is that there isn't any general way to compare any two arbitrary keys in order to determine which one should come before the other. We know at a glance whether one integer is larger or smaller than another in a binary search tree with integer keys, but in an animals tree there's no way to tell at a glance whether ``cat'' will appear to the left or right of ``dog'' or ``bear''- we must traverse the tree, answering questions about these beasts in order to know where they will be located.
So, how can we proceed? There are several possibilities.
One approach is to encode the explicit sequence of operations that was used to construct the tree itself.
For an animals tree, this reduces to storing an explicit sequence of ``yes'', ``no'', questions, and animal names that can be used to reconstruct the original tree. This does not mean that the original game itself needs to be saved: only the information represented by the tree. In effect, we would be recording an abbreviated game of animals, in such a way that reading it back in would simulate replaying the game. This requires storing the relationship between each node and all of its ancestors (i.e. the sequence of "yes" and "no"s that lead to each node).
Like in the previous case, we would need to do a preorder walk of the tree when writing it, in order to ensure that the ancestors of each node are installed before we try to install the node. However, depending on how the tree is actually constructed, it is not always the case that the ancestor questions of an animal node preceeded the animal in the game- in fact, the opposite is the case. Therefore, some care is needed.
For animals, this seems like a complex and inefficient task, but for other problems, this is not necessarily a bad approach. If constructing the tree is not terribly inefficient, or the contents of the tree can be represented easily in some other manner, then this approach works well. For example, in an expression tree, it is very efficient to convert an infix expression into an expression tree (or vice versa)- and the infix expression is a more compact representation of the expression (which makes it more efficient to store).
A better approach (in most cases) is to record the structure of the tree implicitly instead of explicitly recording all of the information used to construct the tree. This requires less storage and is somewhat more flexible.
For example, to write out an animals tree, we do a preorder walk, but along with the contents of each node we record whether the node is a question node or an animal node. When we read the tree back in, we read it again in preorder, and know where we are in the recursion based on whether the node is a question node (in which case we need to recursively read in the ``yes'' and ``no'' children) or an animal (the base case of the recursion).
An example of this algorithm is given in C in figure 6.5. The reading algorithm, though also quite simple, is quite a bit less obvious than the algorithm for reading a binary search tree, and you may need to try it once or twice to convince yourself that it actually works.
Figure 6.5:
Routines to read and write an Animals tree.
Some particular details to note about the code in figure 6.5:
This can be done by modifying the aniTreeWrite function to denote "question" nodes with a single left child with QUESTION_LEFT_CHAR, and nodes with a single right child with QUESTION_RIGHT_CHAR, and then modify the case statement in aniTreeRead as shown in figure 6.6.
Figure 6.6: Dealing with question nodes that
have only a left or right child.
Animal trees have a relatively simple structure, so these routines are quite simple. In the more general case, binary trees can be represented by strings using the method sketched in algorithm 6.1:
An example of a tree represented this way is shown in figure 6.7. Note that this method can easily be extended to work for M-ary trees as well.
Figure 6.7: String representation of a binary tree.
The simplest (and perhaps most flexible) approach is to add an extra integer ``position'' field to each node in the tree. This field is used only by the I/O routines. This field is filled in by an inorder traversal of the tree immediately before it is written, and then used as the key to reconstruct the tree.
The algorithms for reading and writing are given in algorithms 6.2 and 6.3.
This approach is simple and fast, but requires some storage overhead. The fact that it requires storing information in the tree itself means that this approach can only be used when the tree can be modified (and fields for this purpose were set aside when the data structure was created). If this is not the case, then this algorithm cannot be used as stated here, although its central idea can be used. (See the exercises.)