Next: Implementing Tries Up: Tries Previous: Tries as an Abstract

Tries as a Data Structure

As we have seen, when we search for a key in a binary search tree, at each node we visit, we compare the key in that node to the key that we are searching for. All that we care about is whether the key in the node is less than, equal to, or greater than the key we are searching for, however- the precise value of the node's key is not particularly interesting, unless it is an exact match.

If you think carefully about it, you will realize that this manner of search potentially throws away quite a bit of information at each comparison. The path from the root to any node in a binary search tree may represent a considerable amount of information about the contents of the node (and may, in some cases, specify it completely).

A trie is a data structure that stores the information about the contents of each node in the path from the root to the node, rather than the node itself. For example, consider the tree shown in figure 5.1. In this tree, each path between the root and a leaf represents a key, so this trie contains the keys ``had'', ``held'', ``help'', and ``hi''. Note that the nodes themselves are unlabeled. Instead, each transition between two nodes is labeled with a single character from a key.

Figure 5.1: A simple trie containing the keys ``had'', ``held'', ``help'', and ``hi''.

However, this manner of representing keys has a fatal flaw; it is impossible to represent any key that is a prefix of another. This problem is fixed by introducing a special marker on each node that tells whether or not it represents the end of a key. In this text, a node represented by a light circle is an ``internal'' node, while a node represented by a dark circle represents the end of a key. In the code that is used to express the algorithms in this section, a node that represents the end of a key has a non-zero value in its endOfKey field. The trie shown in figure 5.2 contains the same keys as the trie in figure 5.1, with the addition of ``he'' and ``hip''.

Figure 5.2: A trie containing the keys ``had'', ``he'', ``held'', ``help'', ``hi'', and ``hip''.

The algorithm for determining whether or not a particular word is present in the trie is very simple, and is given in algorithm 5.1.

[hbtp]

  1  int trieFind (trie_t *root, char *key)
  2  {
  3          int     i       = 0;
  4          trie_t  *curr   = root;
  5          int     key_len = strlen (key);
  6  
  7          if (root == NULL)
  8                  return (NOT_FOUND);
  9  
 10          for (i = 0; i < key_len; i++) {
 11                  curr = trieFindChild (curr, key[i]);
 12                  if (curr == NULL)
 13                          return (NOT_FOUND);
 14          }
 15  
 16          if (trieIsEndOfKey (curr))
 17                  return (FOUND);
 18          else
 19                  return (NOT_FOUND);
 20  }

If the trie is used to store information about the keys (in addition to the keys themselves), then algorithm 5.1 can be modified to return this information instead of simply FOUND and NOT_FOUND.

The algorithm for inserting a new key is more involved. Like the algorithm for FIND, it traverses the trie until it either reaches the end of the string, or it discovers that the trie does not contain the string. In the first case, all that is necessary is to mark the end node as being the end of a key (and depositing a value into this node, if appropriate). In the second case, the trie must be extended with new nodes that represent the remainder of the string. A recursive algorithm for INSERT is shown in algorithm 5.2.

[hbtp]

  1  trie_t *trieInsert (trie_t *root, char *key)
  2  {
  3  
  4          if (root == NULL)
  5                  root = trieCreate ();
  6  
  7          if (key[0] == '\0') {
  8                  trieEndOfKey (root, END_OF_KEY);
  9          }
 10          else {
 11                  trie_t *child = trieInsert (trieFindChild (root, *key),
 12                                  key + 1);
 13                  trieInsertChild (root, key[0], child);
 14          }
 15  
 16          return (root);
 17  }

The algorithm for deleting a key is similar to insert, but has more special cases to consider. Like insert, it traverses the tree until the end of the key string is reached. If the end node has children, then this node cannot be removed, but all that needs to be done is to remove the marker that tells that it is the end of a key. If the node does not have any children, however, then it can be removed, as can the entire sequence of single-child, non-end-of-key nodes that lead to this node.

For example, in figure 5.3, deleting ``are'' causes nodes 3 and 4 to be deleted, but not 1 and 2 (because these represent part of the other keys represented by the trie). Deleting ``at'' from the original trie cannot delete any nodes, but simply removes the endOfKey marker on node 5. Deleting ``ate'' from the original trie removes node 6, but cannot remove node 5 (even though node 5 has only a single child), because node 5 represents the end of a key.

Figure 5.3: A trie containing the keys ``are'', ``at'', and ``ate''.

Next: Implementing Tries Up: Tries Previous: Tries as an Abstract

Dan Ellard
Mon Jul 21 22:30:59 EDT 1997