CS50 - Section 8

Section Information


Pointers (redux)

It may seem like the last section handout beat pointers to death. There are still a few odds and ends to cover, however.

Pointer problems

One misconception that still haunts many people: when you define a pointer, this does not allocate space for it to point to! For example:

	int		*i;
This defines i to be a pointer to an integer. It does not specify what integer i points to, nor has it allocated memory somewhere for i to point to.

You must never dereference a pointer variable unless you have assigned a valid address to that pointer.


Binary Trees

Binary trees are an example of a data structure and set of related algorithms that can be used as part of many efficient algorithms.

On the last pages of this handout, I've included a printout of a few binary tree functions. These are different from the functions in the lecture notes; I thought it might be interesting to see another approach to the same problems, so I wrote these routines based on slightly different algorithms.

The last function, tree_display, can be very useful for debugging binary tree structures.


Assignment 7

This is a challenging assignment- once you are familiar with binary trees, it should be easier than assignment 6 (and lot more fun, since you get to do something much more interesting). Still, it's not something you should leave until the last minute.

Animals

Once again, some advice about how to approach this problem set:

Note: writing the animals program consists of several steps, which must be done in the order described in the assignment in order to get full credit.

Read the assignment.

In the back of the assignment, you'll find the algorithm you need. If you understand this algorithm fully before you begin, your design will be quick and painless.

Draw pictures; walk through the algorithm step by step. If you can't play the animals game on paper (or a chalk board), your efforts to explain the game to the computer will probably be in vain.

Design your program.

The good news is that a good design makes it easy to write your program. The bad news is that it usually takes time and effort to come up with a good design. The good news is that the time and effort that you spend in the design is repaid many times over in coding time (remember the pyramid from earlier in the semester?). The bad news is that it's hard to learn how to design well. The good news is that we give you an opportunity to practice on this assignment- and we'll even give you feedback along the way (before you have to turn anything in).

Good design is essential to effective programming. Once you have a good design, writing the code will be quick and painless.

In general, a better design will result in less time spent coding and debugging. This program may seem complicated at first, but once you have decomposed the program into smaller pieces, you'll see that it is actually relatively small and simple.

Note that if you complete this part of the assignment before 6:00pm Monday night and email a copy to me, I will review your design and offer suggestions (and let you know if you're headed in the right direction). If you take advantage of this opportunity, you will almost certainly do better on this assignment (and with less pain) than if you do not.

Note that I cannot read pine attachments with my mail reader, which is not pine. When you email your design to me, please do so as plaintext. Use mail if you can't find another way, i.e.

	cat design.txt | mail ellard@deas
	

Pay attention to feedback.

Assuming that you take advantage of our offer to review your design, you should seriously consider taking whatever advice we have to give...

Write your program.

If your design is complete and correct, this should be a breeze. In fact, a few years ago we used a similar assignment, with the same design methodology, and many students reported that after they finished their designs, they were able to type in their programs in an hour or two, and they worked right off the bat. No debugging, no pointer mysteries, just a working program. It could happen to you.

Note that for this assignment, you should organize your program up into separate source files (i.e. .c and .h files). The Roberts book discusses how to do this in chapter 10 (and later chapters).

Review your design.

As you write and test your program, shortcomings in the design may become apparent. Instead of debugging your program directly, it may help to think about it as debugging your design. Make sure that you keep track of any changes that you make to your design; you'll need this information for the next part of the assignment.

Analyze the success of your design. Did it work out the way you planned? What would you change if you could start over from scratch? What have you learned about program design? (pay particular attention to the questions in the assignment)


mystery.c

These innocent functions all do something plausibly useful. With a little commenting (and better choice of variable names) they'd be straightforward. There's really nothing too mysterious here!

Note that we're not asking you to comment, edit, or change these functions in any way. There's no need to even compile them and run them (the work of setting up a test program for each would be a lot of work all by itself). Instead, all we ask is that you read through them and figure out what they do and what algorithm they use to do so.


Recursion

Fundamentals of Recursion

The basic idea behind recursive algorithms is to break down a problem into smaller pieces which you either already know the answer to, or can solve by applying the same algorithm to each piece, and then joining together the results. It is related to the idea of hierarchical decomposition (which we've been harping on since the beginning of the course), but with a twist- instead of breaking a large problem into smaller, different problems, recursive algorithms break down problems into smaller versions of the same problem.

Each recursive algorithm has two cases:

The base case.

The base case is a problem that you either already know the answer to, or can solve using some other method.

The recursive case.

The recursive case is a problem that can be broken down into "smaller" pieces which can be solved individually. The solutions are then combined into the answer for the whole problem.

Note that the recursive case must move towards the base, by reducing the ``size'' of the problem. If it doesn't, then the base case will never be reached, and the algorithm will not terminate.

Note that a recursive algorithm may actually have any number of base cases or recursive cases, but they are generally lumped together conceptually.

Why bother?

Recursion is an extremely useful technique. Although in a theoretical sense recursion is no more powerful than iteration (and vice versa), in practice there are a large number of algorithms that can be stated much more elegantly using recursion (and vice versa).

For example, consider the problem of doing an inorder traversal of an ordered binary tree. This can be done using purely iterative constructs, but is far from trivial to code- give it a try, if you like! (hint- a stack or two will come in handy. I may be able to avoid using stacks through extreme cleverness, but I can't remember exactly how, so it's probably a little on the obscure side.)

As another example, consider the quicksort algorithm (a sorting algorithm we will cover later in the semester). When it was first invented (by C. A. R. Hoare), programming languages that supported recursion were not widely used, so it was initially described and implemented using only iterative constructs. Unfortunately, it is quite complicated to correctly state the quicksort algorithm without recursion. In fact, the original implementations were so complex and hard to follow that there was some debate over whether or not they actually really worked! Once the algorithm was restated in a recursive form, however, the algorithm was quickly proven to be correct.

Examples of recursion

Let's start with something that we're already familiar with: wandering down a linked list looking for an element. We'll use the same structure as you did in your llist_t routines. Imagine you're already at the head of the list, and you want to find the first element that contains the given element, just like llist_lookup.

Base cases:

We've got two to watch out for: if we actually have found the element, and if we're about to shoot off the end of the list and into the NULL abyss. In the first case, we want to return the cell we've found, while in the second we'll return NULL:

			/* The NULL test must come first!	*/
		if (cell == NULL) {
			return (NULL);
		} else if (cell->value == value) {
			return (cell);
		}
	
Recursive cases:

The cell we're looking at is not the one we're looking for. Try again with the next, and return whatever value is generated by this:

		return (llist_llookup (cell->next));
	
Putting this all together:

	llist_lookup (llist_t *cell, int value)
	{

		if (cell == NULL) {
			return (NULL);
		} else if (cell->value == value) {
			return (cell);
		} else {
			return (llist_llookup (cell->next));
		}
	}
	

Now, let's try an algorithm that puts together a few more concepts, playing with both recursion and pointer arithmetic: an algorithm for finding the sum of the elements in an array of integers. For reference, here's an iterative solution:

	int		array_sum (int length, int *array)
	{
		int		i;
		int		sum	= 0;

		for (i = 0; i < length; i++) {
			sum += array [i];
		}

		return (sum);
	}

As an aside, unrelated to recursion, note the use of pointer notation to represent the array. By using pointer notation instead of array notation, we can eliminate the requirement that we specify the length of the array as part of its type, but instead can pass this as a separate parameter. Therefore, this function will work properly with arrays of any length, not just some fixed length.

To restate this recursively, we need to identify the base case and the recursive case (or cases).

For this problem, we have a choice of base cases, depending on what sorts of assumptions we dare to make about the array. If we know that the array will always be at least one element long, then the base case can be that the sum of the elements of an array of length 1 is the value of the first element. However, let's be more general, and allow the user to enter arrays with length zero. So, our base case will be that the sum of an array of zero elements is zero. This translates into the following C:

		if (length == 0) {
			return (0);
		}

To be more defensive, however, we might find it useful to consider the case when the user (or the calling function) goes berserk and enters a negative value for length. To guard against that, we'll modify this to say that the sum of an array of zero or fewer elements is zero:

		if (length <= 0) {
			return (0);
		}

Note: we could also prevent the possibility of a negative length by saying that length is an unsigned integer, and therefore can never be negative. (We haven't talked about unsigned numbers much yet, but you can read about them in K+R or Roberts.)

Now, we need a recursive case. Here's one: the sum of the elements is the sum of the first element in the array and the sum of the rest of the elements. This translates into the following C:

		return (array [0] + array_sum (length - 1, array + 1));

Note the address arithmetic used here: array+1 is the address of the "rest" of the array past the first element.

Putting the pieces together:

	int	array_sum (int length, int *array)
	{

		if (length <= 0) {
			return (0);
		}
		else {
			return (array [0] + array_sum (length - 1, array + 1));
		}
	}
Attached to this handout are several examples of recursive algorithms.

Efficiency of recursive algorithms

Computing the efficiency of a recursive algorithm can be slightly more complicated than for iterative algorithms. For some algorithms, the analysis can be quite tricky (in fact, there are some popular sorting algorithms whose precise order is still unknown) and involve mathematics beyond the prerequisites of this course. Here are some general rules however, that should be helpful in finding the order of an algorithm:


The Final project

The final project has two parts- first, you propose what you want to do for your final project, and then you get to implement it. The proposal is due on November 27 (no assignment that week- hooray!). Note that you cannot use late days for the final project or its proposal- get this done on time, or risk major point deductions. Also keep in mind that you should send me a draft of your proposal at least a few days ahead of the due date, and/or meet with me to discuss your final project- if you don't, you risk my rejecting your final project proposal, which I will do if I think it is not specific enough or does not specify a good project. You should plan to talk to me about your project at least once or twice before you submit a proposal.

A list of TF and their topics of interest can be found in: http://www.deas.harvard.edu/cs/academics/courses/cs50/final-project/proj.html. If your interests don't match up with mine, then feel free to contact a TF whose interests do seem like a better match (although of course you'll need to talk to me about your proposal eventually!). The most important thing is to pick a project that you will enjoy; you'll be spending a lot of time working on it, so you might as well be having a good time doing so!

Your project proposal should look something like a combination of a problem set and a design document (although not necessarily as detailed as the design document for animals). It must describe what the goals of your project are, and how you will achieve them. The design should be detailed enough so that it is clear to me that you understand how to accomplish your project goals. Keep in mind that your project will be graded based on how well you meet your own goals, so it is in your best interest to clearly state your goals and how you will demonstrate that you have met them!

In the past, the most common problems students have had with their final projects were that either they couldn't think of anything interesting (if this is the case for you, talk to me; I've got a hundred ideas that you might find interesting), or they thought of something far too complex or time-consuming to implement (it should be about twice as complex or time-consuming as an ordinary homework, like assignment 6, but not more).

In past years, students who thought through their proposals and let me review them once or twice before the deadline generally found the final project to be the most fun and rewarding part of the course. Students who waited until the last minute, however, deeply regretted it later; several were killed by rhinos.


Please bring any errors or inconsistencies in this document (other than the bit about the rhinos) to the attention of the author.

Dan Ellard - (ellard@deas.harvard.edu)