The reader is assumed to be familiar with this assignment, so there's not a lot of discussion of the general structure of the program here; this document concentrates on several interesting parts of the code.

For each character in the stringbufif the character is "alphabetic", then convert the character to uppercase, find the corresponding "digit", and replace the character with this digit.

This will all take place in one loop, except that the algorithm for finding the corresponding digit will be implemented in a separate function. For example, the code might look like:

/* * Given integer i, and string buf: */ for (i = 0; i < strlen (buf); i++) { if (isalpha (buf [i])) { /* * Note-- on a few (buggy) systems, it is an * error to convert a letter to uppercase * without checking first to make sure that it * is lowercase, because the conversion * assumes that the character is a letter and * is lowercase... and if it isn't then the * results are somewhat unpredictable. This * code assumes that the version of toupper * we're using is robust. */ buf [i] = toupper (buf [i]); buf [i] = uppercase_letter_to_digit (buf [i]); } }

All that remains is to implement the function
`uppercase_letter_to_digit`. This can be done in a number of ways.
I'll discuss several that I think are appropriate, and then a few that
I think are not appropriate.

In order to do table lookup, we must first have a table. We'll construct an array that has letters (or something related to letters) as its index, and the corresponding numbers (as ASCII digits) as its values. Letters that do not correspond to any numbers on the telephone keypad will not be converted.

The first task is to figure out how to compute the index into the
array for each letter. Luckily, the ASCII character set has the
(extremely useful) property that the codes that represent uppercase
letters are arranged alphabetically and contiguously. This means, for
example, that if we know the ASCII code for 'A', then we can compute
the ASCII code for 'B' by simply adding 1 to the code for 'A', because
'B' immediately follows 'A' in alphabetical order. Similarly, the
code for 'D' will be 3 more than the code for 'A', and so on, up until
'Z', which will be 25 more than the code for 'A'. (Similar properties
hold for the lowercase letters and for digits, which is also extremely
useful- for example, we used this property of digits in
**bignum_const** from assignment 5).

This gives us a way to easily translate the letters 'A' through 'Z' to
the numbers 0 through 25-- simply subtract the code for 'A'. For
example, if **c** is a character representing some uppercase
letter, then the following C expression will set **n** to a number
from 0 to 25, representing the position in the alphabet of **c**:

n = c - 'A';

Now all that remains is to construct a table, such that the entry in position zero (for 'A') corresponds to the code for 'A' on the telephone keypad, and so forth. This table is the following (assuming that Z and Q are not mapped):

letter A B C D E F G H I J K L M N index 0 1 2 3 4 5 6 7 8 9 10 11 12 13 character 2 2 2 3 3 3 4 4 4 5 5 5 6 6 letter O P Q R S T U V W X Y Z index 14 15 16 17 18 19 20 21 22 23 24 25 character 6 7 Q 7 7 8 8 8 9 9 9 Z

Since all of the elements of this table are simply characters, the table itself can be a string. In C, we can implement the table as follows:

char *table = "2223334445556667Q77888999Z";

Now we can write our `uppercase_letter_to_digit` function:

char uppercase_letter_to_digit (char c) { char *table = "2223334445556667Q77888999Z"; return (table [c - 'A']); }

An important decision that must be made, when doing *any* sort
of error checking, is *what to do when an error is detected*.

One way to make this function robust to make it capable of handling
*any* possible input, at least in some sense. For this
function, this is easy, and only requires a small redefinition of what
the function is supposed to do, to define what it does when handed a
non-uppercase letter. For example, if **c** is not uppercase, then
perhaps we should just leave it alone:

char uppercase_letter_to_digit (char c) { char *table = "2223334445556667Q77888999Z"; if (isupper (c)) { return (table [c - 'A']); } else { return (c); } }

We might want our function to be even more crabby than that; for
example, in this program, if `upper_letter_to_digit` is
called with a non-uppercase letter, then it means that something has
gone wrong somewhere else, and we might as well find out about it now,
before any damage is done. The following function uses
`assert` to enforce this; the program will exit (with a
possibly useful warning message) if the assumptions of this function
are not met:

char uppercase_letter_to_digit (char c) { char *table = "2223334445556667Q77888999Z"; assert (isupper (c)); return (table [c - 'A']); }

Once again, we construct a table, but this time the contents of the table resembles (in some sense) the telephone keypad itself:

Digit Letters ----- ----- 0 1 2 A B C 3 D E F 4 G H I 5 J K L 6 M N O 7 P R S 8 T U V 9 W X Y

We can express this in C as an array of strings, where the index of each string is digit, and the strings contain all of the letters corresponding to that digit's key:

char *table [] = { "", "", "ABC", "DEF", "GHI", "JKL", "MNO", "PRS", "TUV", "WXY" };

To make the table lookup a little easier, we'll add a NULL entry to the end of the table as a sentinel value.

We can check each of these table entries in turn to see if the
character we're looking for resides there by using the **strchr**
function. Once we know the index, we'll convert this index into once
of the characters '0' through '9' via the same sort of transformation
that we used previously (in this case, the opposite of what we used
earlier in the semester in `bigadd_const`). This gives the
following implementation:

char uppercase_letter_to_digit (char c) { char *table [] = { "", "", "ABC", "DEF", "GHI", "JKL", "MNO", "PRS", "TUV", "WXY", NULL }; int i; for (i = 0; table [i] != NULL; i++) { if (strchr (table [i], c)) { return ('0' + i); } } /* * If we've failed to find any match, then just return c * itself. */ return (c); }

This approach also has the disadvantage that it is hard to modify. Of course, for the 800 application, this is not likely to be much of an issue- telephone keypads aren't likely to change any time soon. However, if you wanted to reuse some of this algorithm (or at least the design behind it) for some other sort of keypad, you'd have to come up with an entirely new relation, and there may not always be such a convenient one. With the table-oriented approaches, dealing with a new keypad or character set is simply a matter of changing the table; nothing else has to change in the slightest. In fact, you could write a utility function that did this translation and took the table itself as a parameter, and so you could use the same function for any number of different mappings.

However, since it seems so popular, it's worth discussing the "equation" approach.

In general, the relation is simple: each digit, starting at '2', gets
the next three letters (starting with 'A'). A glitch occurs at '7',
however, due to skipping over 'Q', and this wrecks the simplicity of
the relationship. 'Z' is a different kind of nuisance- we have to
remember that it *doesn't* correspond to any digit.

If it wasn't for 'Q' and 'Z', we'd just have the following relationship:

(position of digit) == ((position of letter) / 3) + 2

In this relationship, "position of" represents the position of a digit in the sequence 0..9 or the position of a letter in the sequence A..Z. Again taking advantage of the way the ASCII codes are arranged, we can write this as:

digit - '0' == ((letter - 'A') / 3) + 2

So if we have the letter, we can find the digit by doing a little algebra:

digit = '0' + ((letter - 'A') / 3) + 2;

If this was there was to the problem, this would be a fine solution: short, simple, and easy to get right. However, now we must deal with 'Q' and 'Z', in order to fit the reality of the telephone keypad.

Handling 'Z' is relatively simple, because it hangs off the end; it is
an *exception* to the relationship, but doesn't *break*
the relationship:

/* * If the letter is a 'Z', then leave it alone. */ if (letter == 'Z') { return ('Z'); } else { return ('0' + ((letter - 'A') / 3) + 2); }

Dealing with 'Q' is hairier. One approach is to break the relationship into two relationships: one for letters before 'Q', and another for those after. The relation for letters after 'Q' is similar to the other, except that the letters are "off by one". There are a number of ways that this can be expressed; here is one that seems relatively clean:

/* * If the letter is a 'Z' or 'Q', then leave it alone. */ if (letter == 'Z' || letter == 'Q') { return (letter); } else if (letter < 'Q') { return ('0' + ((letter - 'A') / 3) + 2); } else { return ('0' + (((letter - 1) - 'A') / 3) + 2); }

We'd probably write this a little differently in our function, however, in order to make it a little more robust:

char uppercase_letter_to_digit (char c) { /* * If c is a 'Z' or 'Q', then leave it alone. */ if (c == 'Q' || c == 'Z') { return (c); } else if (c >= 'A' && c < 'Q') { return ('0' + ((c - 'A') / 3) + 2); } else if (c > 'Q' && c < 'Z') { return ('0' + (((c - 1) - 'A') / 3) + 2); } else { return (c); } }

One thing to note about this function is that the first condition
(which deals with 'Q' and 'Z') is actually completely redundant; any
instances of 'Q' and 'Z' would otherwise be handled identically by the
`else` statement at the end. I prefer, for stylistic
reasons, to structure the code this way, so that all letters are
handled explicitly (and the `else` is just for all other
non-letter characters). It also helps to clarify (at least to me)
exactly why some of the relational operators in the conditionals are
**<** or **>** instead of **<=** or **>=** (or
vice versa)- it's because these two letters are special.

It's also true that some of the other conditions are mildly redundant:
for example, in the second condition, it is not strictly necessary to
write `letter < 'Q'`; we could have written this as
`letter <= 'Q'` instead and gotten away with it. Writing
it this way (being careful with each condition) does allow a nice
property, however- we can reorder these conditions arbitrarily and
*still get exactly the same results*. This is a very nice
property for code to have, since it makes it easier to modify and
extend in the future, as well as easier to understand right now.

This code, of course, is a mess of magic numbers, and in this form cannot avoid style deductions (or peer ridicule). A few macros help out here quite a bit:

#define LETTERS_PER_DIGIT 3 #define DIGITS_OFFSET 2 #define DIGIT_POS_TO_ASCII(d) ('0' + (d)) #define ASCII_TO_LETTER_POS(l) ((l) - 'A')

But the result is still a little muddled, and 'Q' and 'Z' are
*still* magic all by themselves, and this is hard to avoid.

- The table lookup approach can be hardwired by using a
`switch`statement instead of an explicit table. The result is painful to look at:switch (c) { case 'A' : case 'B' : case 'C' : return ('2'); /* * et cetera, et cetera. */ case 'W' : case 'X' : case 'Y' : return ('9'); default : return (c); }

At least it has the advantage of being relatively easy to understand, and simple (if tedious) to change if the keypad changes.

- A worse variation is to do a cascaded
`if`, but instead of doing tests for equality, test for some other relationship. For example:if (c <= 'C') return (2); else if (c <= 'F') return (3); ...

This approach combines the tedium of the

`switch`statement approach with the inflexibility of the equation approach, and every line contains a magic number. This sort of approach should be avoided. - The equation approach can be expressed in a number of similar
ways... but they can be hard to get exactly right. This
problem is made much worse if the code that computes this
equation is not made into its own function, or has to deal
with lowercase as well as uppercase letters (as separate
cases). This approach has a number of ways that it
can be expressed clumsily; you'll know them when you
see them...

Some people modified their binary search functions to do quite a bit more than just a binary search through an array of strings, however- some designs delegated the work of translating the letters to digits, trimming the input string, etc to this function. One implementation also implemented the user interface inside the binary search function! For quick-and-dirty programming, this is not too terrible, as long as it gets the right answer. As a design, however, it violates an important principle: each function should do one thing, and do it well.

In this case, the one thing that the binary search function needs to do is search in a sorted array of strings for a string that has a given prefix. Such a function is of general use. A function that does binary search and character conversion and who knows what all else is entirely specific to the 800 application, and cannot be reused for other programs without a great deal of work.

Please report and errors or inconsistencies in this document to the author.

Dan Ellard - ellard@deas.harvard.edu