next up previous contents
Next: Reducing Memory Requirements: Eliminating Up: Tries Previous: Tries as a Data

Implementing Tries

The most obvious way of implementing a trie is as an n-ary tree, where n is the number of elements in the alphabet. In addition to the child pointers in each node, each node also contains a flag that indicates whether or not it is an end-of-key node, and the value associated with the key (if any). For example, if we wanted to construct a trie for the alphabet of the 26 lowercase letters, where each key has an integer associated with it, the C definition shown in figure 5.4 for the trie_t structure could be used.

 
Figure 5.4:   A 26-ary tree node to implement a trie of strings of lowercase letters.

Each letter of the alphabet maps to one of the indices of the children array, and writing the code to navigate through the resulting tree is quite simple.

However, from an engineering point of view, this simplistic approach can be disastrous for many applications, because the typical number of children of each node is far fewer than the size of the alphabet. For example, in figure 5.2, each node has, on average, 0.9 children, although this representation must always allocate enough space for each node to have 26 children.

For example, on the HP computers, sizeof(trie_t) == 112, so if we used this structure to represent the six keys in the trie illustrated in figure 5.2, this structure would require 1120 bytes of memory to represent six strings which have a combined length of only 24 bytes!

As another example, a trie consisting of the 20,000 lower-case words in my computer's small spell-checking dictionary requires 62,000 nodes to represent, for a total of nearly 7 megabytes of memory, even though the dictionary itself is only 200,000 bytes in length. Thus, a trie to represent this small dictionary requires nearly 35 times more memory than the dictionary itself.

Seven megabytes might not seem like a very large amount of memory by today's standards, but note the factor of 35 increase in memory consumption- by picking a larger problem, we can soon soak up enough memory to cause concern on any system. Depending on the nature of the keys represented by the trie, this factor may grow or shrink, but a few empirical experiments suggest that this factor is actually lower than many typical sets of keys!

If we expand our representation to allow uppercase letters and punctuation, then the memory requirements would increase even further. However, this might not be necessary, at least in the case of the dictionary; only a small percentage of the words in the dictionary (at least, the spell-checking dictionary I have) contain uppercase letters, and nearly all of these uppercase letters appear as the first letter in a word. Therefore, it would be possible to handle nearly all words that contain an uppercase letter by adding a special case to the way that the root node of our trie is treated. Punctuation is more troublesome, however, since the position of punctuation in words is not as conveniently localized.


next up previous contents
Next: Reducing Memory Requirements: Eliminating Up: Tries Previous: Tries as a Data

Dan Ellard
Mon Jul 21 22:30:59 EDT 1997