CS50 Algorithm Review Notes

Contents


Algorithms Basics

Make sure that your algorithms are both clear and concise. You do not need to go into minute detail, but you do need to clearly state your assumptions and the steps that your algorithm takes in order to reach its goal.

Trimming a Beard

If you were at the first lecture of the course, you saw an example of how a simple procedure can be hard to describe.

There were a few things about the example which made it particularly hard:

Things to Keep In Mind

Identifying the similarities and relationships between different problems is the key concept in the study of algorithms. For example, if someone asked you to write an algorithm for trimming a mustache, sandwich, how much of a beard algorithm could you reuse?


Algorithmic Efficiency

Computers can execute simple operations many times faster than most humans. For example, the computers we use for this course (which are not particularly peppy) can perform tens of millions of multiplications per second, while most people have to pause and think for a moment to compute the product of a pair of two digit numbers. Given that computers are so fast, you might wonder why anyone would care whether or not the programs you write are efficient.

It turns out that it does matter. In fact, the study of efficient algorithms is one the most important areas of computer science. There are several reasons why this is so:

Big-O notation

To express the efficiency of an algorithm, a notation called ``big-O notation'' is used. The O stands for the order of the function that specifies the relation between the number of operations that the algorithm requires and the ``size'' of the input, which is usually denoted as n.

Note that the ``size'' of the input may actually represent different things, depending on context. When talking about the number of steps required to do something to an array, n usually refers to the number of elements in the array, while when talking about how many steps it takes to multiply two numbers, n may be the number of digits in each of the numbers, and so on.

When dealing with big-O notation, only the order or dominant term in the function is considered important, since this term will dominate the behavior of the function as n gets larger. In addition, all constant factors are disregarded. For example, in the following table, f(n) is assumed to give the exact number of steps required to solve a problem of size n:

1.	f(n)	= 10 * n^2 + 100n + 10		O(f(n))	= n^2
2.	f(n)	= 0.0001 * n^2 + 10000 * n^1.9	O(f(n))	= n^2
In the first example, the n^2 (n squared) term will dominate in the long run, and the fact that this is multiplied by 10 is not important to the order.

In the second example, even though the 10000 * n^1.9 will contribute a larger portion of the total than the square term unless n is very large, nevertheless the order is n^2 again because that is the term that will dominate as n becomes more and more gigantic.

Note that the order of an algorithm does not specify how long it will take to run (except in a general way, as n becomes very large). In this example, even though the two algorithms have the same order, the first will be faster for small n, while the second will be faster once n becomes large. Similarly, consider the following table:

1.	f(n)	= n^2 			O(f(n))	= n^2
2.	f(n)	= 100000 * n		O(f(n))	= n
The O(n) algorithm is clearly much faster when n is larger- but how much larger? In this case, the two are equal when n is 10,000-- for any n smaller than 10,00, the "slower" algorithm will get the job done more quickly.

Cases like this are not uncommon (sometimes the "better" algorithm will have a big constant out front that will make it inefficient for small n) but in general, algorithms with smaller orders are more desireable.


Random numbers

Definitions

Deterministic and Nondeterministic

A function (or process) is deterministic if, given a set of inputs (or an initial state) the output (or subsequent state) is unique and completely determined. If a function (or process) is nondeterministic, then given a set of inputs (or an initial state), there is a fixed set of possible outputs (or subsequent states), any of which are correct.

These terms have more precise and complete definitions in theoretical computer science, but this should hold us for now.

Random and Pseudo-random

A random number generator is a function that produces truly random numbers. However, it is impossible to write a random number generator using a deterministic computer (like the ones we use), so we have to settle for pseudo-random number generators- functions that behave in ways that are convincingly random, but are actually completely deterministic.

Generating random integers: NSTT

Imagine that you want to generate a random integer in the range between low and high:
  1. Normalize: get an integer d from rand, and normalize it (so that it falls in the range 0.0 to 1.0).

  2. Scale: multiply the normalized random number by the size of range (high - low) of numbers desired.

  3. Truncate: convert the number back to integer.

  4. Translate: add low to the truncated number, to translate it to the desired range.

Generating random reals: NST

The same basic steps, except that no truncation is necessary, and the scaling is slightly different:

  1. Normalize: get an integer d from rand, and normalize it (so that it falls in the range 0.0 to 1.0).

  2. Scale: multiply the normalized random number by the size of range (high - low) of numbers desired.

  3. Translate: add low to the truncated number, to translate it to the desired range.

srand

Make sure you understand what srand (and Randomize) do, and why this is important.

You can think of rand() as somehow containing a very long list of numbers, and every time you call rand() it just gives you the next number. srand() seeds the random number generator, which is analogous to telling rand() where to start reading in the list. This list is always the same, so if you give srand() the same seed again, rand() will start in the same place again. If you don't use srand() (or Randomize(), or some other function that calls srand()), then rand() always starts at the ``beginning'' of the list, and you will always get the same sequence of random numbers.

Generally, it is only appropriate to call srand (or Randomize()) once per program- there's no need to call it again, once you've selected a seed.

One way to generate a seed that will change from run to run is to use the current time as the seed. (This method is described in detail in the lecture notes.) You might also want a way to let the user set the seed to an explicit value, so that they can get the same random numbers whenever they want, but can also change the seed (so they're not always starting at the same place).


The Monte Carlo Technique

It may seem silly then to use the Monte Carlo technique to come up with approximate solutions to problems that people know how to compute exact solutions for- and in some cases, it is. However, the Monte Carlo technique is extremely useful for problems which do not have analytical solutions (or that you don't happened to know an analytical solution for). Even better, it turns out that for a wide variety of otherwise unsolvable problems, the Monte Carlo technique can be shown to be the most efficient way possible to achieve an accurate approximation. (A proof of this is somewhat beyond the scope of this course, but feel free to ask.) The important point is that the Monte Carlo technique and its variations are very useful things to have in your bag of tricks.

So, let's think for a moment about how this technique actually works. Imagine that you had a fair coin, and you flipped it a large number of times- you'd expect if you flipped it enough times, about half the time it would come up heads. We expect it to come up heads exactly half the time, because it is a fair coin (but of course we're not surprised at all if it comes up a few more heads than tails, or vice versa).

However, this also works in the other direction: if we flip a coin a large number of times, and it comes up heads about half the time, then we soon start to expect that it is a fair coin! Similarly, if we flip it a large number of times and it only comes up heads 1/4 of the time, then we will soon start to think that the coin is unfair, and in fact is three times more likely to come up tails than heads. Just as we expect a certain distribution of heads and tails for a coin whose "fairness" we know, given a certain distribution of heads and tails, we can start to form an estimate of the fairness of the coin.

Of course, we need to flip the coin a large number before we can start to have a lot of confidence in our estimate. For example, if you flip the coin once, it will either be heads or tails, and your estimate based on the distribution so far will be that the coin is absolutely unfair in one direction or the other! However, after many coin flips, the distribution of actual events will, in all likehood, give us a pretty good idea what the true bias on the coin is.


Stacks

Note: we didn't talk about stacks much this year in cs50. Still, this section is worth a read-through.

A stack is a data structure that has several operations associated with it:

Initialization

The purpose of initialization is to do whatever is necessary in order to get ready to use a stack. This step is necessary because when you implement a stack data structure in C, you often need to do something in order to make sure that the space that the C compiler sets aside for the stack actually starts out with whatever the appropriate values are for an empty stack.

For example, if you define an array variable inside a function, there's no telling what sort of random garbage might show up there- so often you'll use a loop to stroll through the array, filling it with zeros or some other appropriate initial value. The same sort of thing happens with stacks (or any other data structure) in C- you can tell the C compiler to set aside memory to use as a stack (much as you told it to set aside memory for an array) but you can't tell it what values to put into that memory (just as you couldn't tell it what values to put into the array). Therefore, in C (and similar languages), it is appropriate to write a initialization routine that does whatever is necessary to "clean up" the stack and reinitialize it to being an empty stack.

Push

The push operation deposits a new element on the top of the stack.

If the data structure that you are using to implement the stack has a fixed maximum number of elements that it can contain at the same time (i.e. the maximum number of be pushes less the number of pops since the last initialization), then if the application tries to do another push, an error occurs. This situation is often called stack overflow.

Pop

The pop operation removes the topmost element on the stack (which is the element that was most recently pushed), and returns it.

If the stack is empty (i.e. it contains no values), then it is considered an error (for the purposes of this assignment, at least) to try to pop another value off of the stack. This situation is often called stack underflow.

There are a few other operations associated with stacks that you might see:

Peek

The peek (n) operation returns the nth element in the stack, starting from the top.

Rotate

The rotate (n) operation removes the nth value of the stack, and then pushes it on to the top of the stack.

As an exercise, you might find it interesting to figure out algorithms to implement peek and rotate without taking advantage of any particular underlying implementation of the stack data structure- can you implement peek and rotate in terms of push and pop and a fixed number of temporary variables? The answer turns out to be yes, but only if you can use another stack as scratch space!

Of course, if you know the underlying implementation of the data structure used for the stack, then peek and rotate are usually very simple. You might find it interesting to think about how you could implement stack_peek and stack_rotate in terms of the other functions.


Binary Trees

Binary trees are an example of a data structure and set of related algorithms that can be used as part of many efficient algorithms.

Also of interest might be some code that implements some binary tree functions. These are different from the functions in the lecture notes; I thought it might be interesting to see another approach to the same problems, so I wrote these routines based on slightly different algorithms.

The last function, tree_display, can be very useful for debugging binary tree structures.


Recursion

Fundamentals of Recursion

The basic idea behind recursive algorithms is to break down a problem into smaller pieces which you either already know the answer to, or can solve by applying the same algorithm to each piece, and then joining together the results. It is related to the idea of hierarchical decomposition (which we've been harping on since the beginning of the course), but with a twist- instead of breaking a large problem into smaller, different problems, recursive algorithms break down problems into smaller versions of the same problem.

Each recursive algorithm has two cases:

The base case.

The base case is a problem that you either already know the answer to, or can solve using some other method.

Note that a recursive algorithm may have more than one base case, but they are all lumped together under this heading.

The recursive case.

The recursive case is a problem that can be broken down into "smaller" pieces which can be solved individually. The solutions are then combined into the answer for the whole problem.

Note that the recursive case must move towards the base, by reducing the ``size'' of the problem. If it doesn't, then the base case will never be reached, and the algorithm will not terminate.

Note again that a recursive algorithm may have more than one recursive case (i.e. it may recurse more than once- binary tree traversal algorithms are a good example of this).

Why bother?

Recursion is an extremely useful technique. Although in a theoretical sense recursion is no more powerful than iteration (and vice versa), in practice there are a large number of algorithms that can be stated much more elegantly using recursion (and vice versa).

For example, consider the problem of doing an inorder traversal of an ordered binary tree. This can be done using purely iterative constructs, but is far from trivial to code- give it a try, if you like! (hint- a stack or two or some linked lists will come in handy.)

As another example, consider the quicksort algorithm. When it was first invented (by C. A. R. Hoare, I believe), programming languages that supported recursion were not widely used, so it was initially described and implemented using only iterative constructs. Unfortunately, it is quite complicated to correctly state the quicksort algorithm without recursion. In fact, the original implementations were so complex and hard to follow that there was some debate over whether or not they actually really worked! Once the algorithm was restated in a recursive form, however, the algorithm was quickly proven to be correct.

Examples of recursion

Let's start with something that we're already familiar with: wandering down a linked list looking for an element. We'll use the same structure as you did in your llist_t routines. Imagine you're already at the head of the list, and you want to find the first element that contains the given element, just like llist_lookup.

Base cases:

We've got two to watch out for: if we actually have found the element, and if we're about to shoot off the end of the list and into the NULL abyss. In the first case, we want to return the cell we've found, while in the second we'll return NULL:

			/* The NULL test must come first!	*/
		if (cell == NULL) {
			return (NULL);
		} else if (cell->value == value) {
			return (cell);
		}
	
Recursive cases:

The cell we're looking at is not the one we're looking for. Try again with the next, and return whatever value is generated by this:

		return (llist_llookup (cell->next));
	
Putting this all together:

	llist_lookup (llist_t *cell, int value)
	{

		if (cell == NULL) {
			return (NULL);
		} else if (cell->value == value) {
			return (cell);
		} else {
			return (llist_llookup (cell->next));
		}
	}
	

Now, let's try an algorithm that puts together a few more concepts, playing with both recursion and pointer arithmetic: an algorithm for finding the sum of the elements in an array of integers. For reference, here's an iterative solution:

	int		array_sum (int length, int *array)
	{
		int		i;
		int		sum	= 0;

		for (i = 0; i < length; i++) {
			sum += array [i];
		}

		return (sum);
	}

As an aside, unrelated to recursion, note the use of pointer notation to represent the array. By using pointer notation instead of array notation, we can eliminate the requirement that we specify the length of the array as part of its type, but instead can pass this as a separate parameter. Therefore, this function will work properly with arrays of any length, not just some fixed length.

Note: this iterative algorithm is perfectly adequate; it would be unnecessary and generally unlikely for this to be recoded recursively- we're just doing this for the mental exercise, not because it's the best way.) To restate this recursively, we need to identify the base case and the recursive case (or cases).

For this problem, we have a choice of base cases, depending on what assumptions we dare to make about the array. If we know that the array will always be at least one element long, then the base case can be that the sum of the elements of an array of length 1 is the value of the first element. However, let's be more general, and allow the user to enter arrays with length zero. So, our base case will be that the sum of an array of zero elements is zero. This translates into the following C:

		if (length == 0) {
			return (0);
		}

Now, we need a recursive case. Here's one: the sum of the elements is the sum of the first element in the array and the sum of the rest of the elements. This translates into the following C:

		return (array [0] + array_sum (length - 1, array + 1));

Note the address arithmetic used here: array+1 is the address of the "rest" of the array past the first element.

Putting the pieces together:

	int	array_sum (int length, int *array)
	{

		if (length == 0) {
			return (0);
		}
		else {
			return (array [0] +
					array_sum (length - 1, array + 1));
		}
	}

This algorithm is correct, but has a small flaw- we're not programming defensively. If someone calls this function with a negative length, the base case will never occur, and we'll recurse infinitely. Of course, nobody should call this function with a negative list, but we're not doing anything to stop them from doing so. There are two things we can do, however: we could make length an unsigned integer, so that it cannot ever be negative, or we could change the comparison of length == 0 to length <= 0.

Efficiency of recursive algorithms

Computing the efficiency of a recursive algorithm can be slightly more complicated than for iterative algorithms. For some algorithms, the analysis can be quite tricky (in fact, there are some popular sorting algorithms whose precise order remained unknown for many years) and involve mathematics beyond the prerequisites of this course. Here are some general rules however, that should be helpful in finding the order of an algorithm:


Sorting

Sorting is a fundamental problem in Computer Science. There are many algorithms that depend on having sorted data (or having the ability sort data), and a substantial percentage of all the computing resources in the world are spent keeping records sorted. To make things even more interesting, sorting algorithms are intellectually stimulating, and there has been a lot of interesting research done (over a period of decades!) to find good ways to sort things. This has died down somewhat in current years (now that sorting algorithms that approach most of the theoretical limits for different situations have been discovered), but there are still unanswered questions in this field. Maybe you'll help to answer them.

Terms and Jargon

Stable

A sorting algorithm is stable if elements in the initial ordering that are considered ``equal'' by whatever comparison you're sorting by end up in the same relative order in the final order.

That sounds confusing; let's try an example. Imagine that you have a list of people's names that you want to sort into alphabetical order (considering both first and last names). Unfortunately, the first and last names are stored in different parts of the database and for whatever reason it's not possible to consider both of them at the same time. So, what can you do? The answer is to sort the list twice, using a stable sort. The first time sort is by first name, and the second sort is by last names.

Partition

To partition a data set means to divide the data set into two sets, one of which contains all the elements that come before a given value, while the other contains all the elements that come after the given value. (There may also be a third set of elements that are equal to the given value, or these elements may appear in either set.)

As an example, in an ordered binary tree, each node partitions its descendants- all of the descendants that are less than or equal to the node are to the left, while all of the greater descendants are to the right.

Sorting Algorithms

Bubble Sort

Bubble sort gets a lot of bad press. It's a simple and easy-to-code algorithm (making it unattractive to CompSci geeks, but appealing to people who want to get something up and running quickly), but it fundamentally an O(n^2) algorithm that also requires a lot of data movement. If you toss in a few tricks, it can be optimized for special cases where the data is ``almost'' sorted, but generally it runs like a slug.

However, it does have some redeeming qualities- it is trivial to add an optimization to detect presorted data, and it is easy to make stable. Even better, it can be implemented in a very efficient on a parallel computer. Still, nobody loves it.

Selection Sort

Selection sort is generally an improvement over bubble sort, but is still an O(n^2) algorithm. The advantage of selection sort is that every time it makes a swap, it puts an element into the ``right'' place. Therefore, the number of swaps is generally far less than it is for bubble sort. If your data is such that it's costly to swap, this is not a bad sort to use.

Selection sort is also easy to make stable.

Merge Sort

Merge sort is a classic example of what CompSci people call a ``divide and conquer'' algorithm- it divides the data into halves, recursively sorts each half, and then merges the two halves back together.

Quick Sort

Quick sort is another recursive algorithm, and resembles merge sort in a general way. The difference from merge sort is that the data is partitioned before the recursion takes place, so that the merge step is unnecessary.

In the expected case, quick sort (although O(n ln n) like merge sort) is generally faster than merge sort. However, in the worst case (i.e. when fate conspires to make the algorithm always choose a bad pivot value) it can become O(n^2) just like bubble or selection sort.

Note that there are several partitioning algorithms that can be used (and an uncountable number of different algorithms for choosing the pivot element). However, the basic algorithm behind quick sort is always the same- partition the data by some pivot value, and then recursively quick sort each partition. Note that the partitioning algorithm given in the lecture notes is not stable, but there are partitioning algorithms that are.

As an aside, there are methods of choosing the pivot element that ensure that quicksort does not ever go quadratic. These algorithms are somewhat esoteric and a relatively recent discovery, however, so we'll always assume that quicksort can go quadratic in the worst case. If you're interested in knowing how this can be done, feel free to ask.

Other Sorting Algorithms

Here are some other sorting algorithms that you may encounter:

Insertion Sort

A simple O(n^2) sort that becomes O(n) if the data is already ``almost'' sorted. Similar in nature to selection sort, except that instead of searching the data looking for the element that should come next, it picks up the first element and tries to put it where it belongs (i.e. before the next element it can find that's less than it). If the data is almost sorted (i.e. each element is no more than a few positions away from where it ought to be), then this is very fast.


Please bring any errors or inconsistencies in this document to the attention of the author.

Dan Ellard - (ellard@deas.harvard.edu)