Next: Reducing Memory Requirements: Smaller Up: Tries Previous: Implementing Tries

Reducing Memory Requirements: Eliminating Nodes

Special Case: Leaf Nodes

One observation is that in our implementation, leaf nodes are particularly wasteful: leaf nodes have no children, but require as much space as nodes that have 26 children.

If the only thing that we are interested in storing in the trie is a set of keys (with no associated values), then leaf nodes contain no useful information except that their endOfKey value is non-zero. This suggests an immediate optimization: represent all of the leaf nodes with a single leaf. This idea is illustrated in figure 5.5, which once again represents the same keys as the trie in figure 5.2.

Figure 5.5: A trie with a single shared leaf.

If we do wish to associate values with each key, then unique leaf nodes are necessary- but they only need to be large enough to hold the value, and can omit the pointers to 26 children.

Patricia Tries

One variation of tries is called a Patricia Tree (or simply Patricia). The actual patricia data structure is somewhat different from what I describe here; I am focusing on a specific idea used by patricias that is particularly useful in our effort to reduce the memory consumption of tries. See Knuth, The Art of Computer Programming, vol 3, or Lewis and Denenberg, Data Structures and Their Algorithms for a full exposition of patricias.

In a patricia, sequences of nodes that all have only one child and do not represent the end of a key are collapsed into a single transition. We can apply this idea to our tries by adding the possibility that each transition represents several characters of a string, instead of just one. An example of such a trie, representing the keys ``he'', ``hello'', and ``hear'' is shown in figure 5.6. Note that if only keys are stored in this kind of trie, then the trick of combining leaf nodes can be used again here, although this is not illustrated in figure 5.6.

Figure 5.6: A patricia representing ``he'', ``hello'', and ``hear''.

At first glance, this would seem to greatly complicate the search algorithm, since during a search, at each node we might have to look at an unknown number of characters of the key we are searching for instead of single character. However, in actuality this does not complicate matters very much. Since only sequences of nodes that have one child are collapsed, no such collapsed sequence can possibly be a prefix of another. Therefore, although each transition might represent many characters, only one transition (the transition that begins with the next character in the key) can possibly be valid at each step. Therefore, the search is straightforward and efficient.

INSERT and DELETE in a trie constructed this way is slightly more complicated. When a key is deleted, it might present an opportunity to collapse a sequence of transitions, and when a key is added it may be necessary to uncollapse a transition.

Next: Reducing Memory Requirements: Smaller Up: Tries Previous: Implementing Tries

Dan Ellard
Mon Jul 21 22:30:59 EDT 1997