CS50 C Review Notes

Contents


What Is A Program?

A program is the expression or implementation of an algorithm.

There are a number of ways that algorithms can be expressed or implemented. One of those ways is by using the C programming language.

C is a particularly useful way to express algorithms, since computers "understand" algorithms expressed in C. This is because C is a computer programming language- a language that can be used to express algorithms in a way that computers can understand.

Although in cs50 we concentrate on the details of C, most of the principles we have seen (such as the elements of design and hierarchical decomposition) apply to writing programs in nearly any computer language.


Conditional Execution

In the first few programs we saw, execution started at the beginning of the main function and proceeded to the end. That's just fine for programs like hello, but for more complicated programs we may want the execution to proceed in a different way when particular conditions are met. This is called conditional execution.

C provides several straightforward ways to do conditional execution. The most fundamental of these is the if statement, and we'll see others later in the semester.

The if statement

The if statement has two forms, depending on whether or not there is an else statement:

	if ( condition )
		if-stmnt


	if ( condition )
		if-stmnt
	else
		else-stmnt
If the condition expression is true (which in C means that it evaluates to anything non-zero), then the if-stmnt is executed. Otherwise, if there is an else-stmnt, then execute it.

Logical operators

In order to help write condition expressions, a number of logical operators are available in C. A logical operator is an operator which evaluates to either true or false (i.e. non-zero or zero).

An Example

OK, so now we know how to do conditional execution, and how to write conditions. Let's do something! Here's a simple program that uses these ideas:

#include	

#include	
#include	

int		main ()
{
	int		input	= GetInteger ();

	if (input > 10) {
		printf ("%d is > than 10.\n", input);
	}
	else {
		printf ("%d is <= to 10.\n", input);
	}
}

Short circuit evaluation of Boolean operators

Boolean expressions in C are evaluated in a way that takes a little getting used to- they can be short circuited. For example, consider the following expressions:

		a && b && c
		d || e || f
	
In the first expression, if any one of the subexpressions is FALSE, then there's no way that the whole Boolean expression can be true. Therefore, C evaluates each subexpression in turn, from left to right, stopping immediately as soon as it finds the first FALSE. If no FALSE subexpressions is found after all the subexpressions have been evaluated, the value of the expression is TRUE.

Similarly, in the second expression, if any one of the subexpressions is TRUE, then there's no way that the whole Boolean expression can be false. Therefore, C evaluates each subexpression in turn, from left to right, stopping immediately as soon as it finds the first TRUE. If no TRUE subexpressions is found after all the subexpressions have been evaluated, the value of the expression is FALSE.

These rules can cause strange behavior if the subexpressions of the Boolean operators have side effects (such as printing things on the screen, or assigning a value to a variable). Therefore, you should always avoid such situations.

The ?: operator

The ?: operator (pronounced "question mark colon") can be very handy. It is like an abbreviated form of the if-else statement, with the limitation that it can only contain expressions, and the benefit that in (like all other expressions) evaluates to a value and can be used as part of a larger expression.

For example, here is a statement that computes the larger of two integers (x and y) and places it in z:

	z = x > y ? x : y;

switch and case

The switch statement allows your program to execute a choice of statements, based on the switch value.

  1. The switch value needs to evaluate to an integer (including things like chars, which are a special kind of integer).
  2. Each of the cases must be a constant integer.


Iteration

Iteration (or looping) is a similar idea to conditional execution, except that instead of determining whether or not a statement should be executed, iteration provides a way to indicate that a statement should be executed a number of times- and control the number of times that it is executed.

The for Loop

The for statement has the following form:

	for ( init ; cond ; incr )
		body-stmnt
Each of the init, cond, and incr expressions are optional. Omitting them all (i.e. for (;;)) is one of the standard ways to get an "infinite" loop (which can be handy, although you still need a way to get out, such as break).

The while loop

The while statement has the following form:

	while ( cond )
		while-stmnt
The while statement is exactly equivalent to a for statement with no initialization or increment expressions, but while is still included in C because it helps to write programs that are easier to read.

The do loop

The do statement has the following form:

	do
		do-stmnt
	while ( cond ) ;
Note that the do statement requires a semicolon (;) after the closing parenthesis of the condition, unlike the other loop statements.

The break Statement

Another statement that you should be aware of is the break statement. One use of the break statement is to jump out of one level of for while or do loops.

The break statement gets you out of the current loop or switch statement you're in. It only pops you out one level, so if you're in a nested loop and want to get out of all the levels, then something more drastic is necessary.

Ordinarily, you should try to write programs in such a way that the loop condition is the only thing that will cause the program to stop looping, but sometimes this is inconvenient or makes the loop condition large and unwieldy.

For example, you might want to write a loop which reads integers and then prints them out, but stops right away (before printing the number) when the user types in a negative integer. This can be written in any number of ways, but using a break makes it very simple:

	for (;;) {
		int		number;

		number = GetInteger ();

		if (number < 0) {
			break;
		}

		printf ("%d\n", number);
	}

The continue statement

The continue statement sends you back to the top of the current loop, without doing the rest of the statements in the body of the loop. The next statement executed will be the increment expression (for for loops) or the cond expression (for while and do loops).

Note that unlike break, continue does not apply to switch statements.

The continue statement is sometimes useful when you have a complicated condition that determines what you want to do inside a loop. Many people never use continue, but you should at least be aware of it.



Functions

We already know how to write one function, the >main> function, which must appear in all C programs. Other functions are declared in the same manner, although they may look slightly more complicated, since they may take arguments or return values.

A function definition has the following form:

	type function-name ( parameter-list )
	{
		function-body
	} 
type
The type of value that the function returns, such as int or double.

function-name
The name that you are giving this function. The name should be different from the names of any other functions or variables in your program, and must obey the rules for the names of a function or variable.

parameter-list
A list of the parameters or arguments to this function, and what type they are.

function-body
The piece of code that is executed when the function is called.

Note that the function body must be enclosed in curly braces. Unlike other statements (i.e. if statements) where the curly braces may be omitted if the body consists of a single statement, the curly braces are mandatory here.

For example, imagine that we want to write a function which takes two integers, prints them out on the screen, and then returns their sum:

	int		add_ints (int a, int b)
	{
		int		sum;

		sum = a + b;

		printf ("The two numbers are %d and %d\n",
				a, b);

		return (sum);
	}
Then we can write a main function that calls this function:

	void		main (void)
	{

		add_ints (12, 13);
	}
For another example, let's modify the code that Professor Seltzer used in lecture (page 30 of the course notes). From a functional point of view (what the program does), this program has two main parts: first, it asks the user for a number, and then it prints out a triangle of asterix characters of the corresponding size. Therefore, we can break this program down into three functions: main, a function to prompt the user for an integer and reads it in, and a function to print out the triangle. The code is on the last page of this handout (in triangle.c).

The void type

Something that seems to have confused a large number of students is the void type. A function is of type void if it doesn't return anything. A function has a parameter list of void if it doesn't take any arguments. For example, the following function doesn't take any arguments, just prints hello on the screen, and doesn't return anything:

	void		hello (void)
	{
		printf ("hello");
	}
A function like GetInteger, however, does return something (an integer), but it doesn't take any arguments, so its definition is:

	int		GetInteger (void)
	{
		/* etc. */
	}

Arrays

Arrays are a new kind of variable. Arrays are variables that can be be used to hold a bunch of values (all of the same type) at the same time.

  • C arrays always start at index 0. If an array has n elements, the last element in the array is at index n-1.

  • C will not prevent your program from accessing elements past the end (or before the beginning) of your array. This can cause catastrophic errors (or worse, small baffling errors) if you're not careful. It's up to you to make sure that your programs don't try to access outside the bounds of an array.

  • The length of an array must be specified as a constant, not a variable or any expression that involves a variable or function call.

  • Multidimensional arrays are created by adding another size specifier. For example, to create a 3-by-3 array to represent a tic-tac-toe board:

    	int		board [3][3];
    	

    Access to multidimensioned arrays is done pretty much in the way you'd expect, i.e.

    	board [1][2] = 4;
    	

  • Assignment of entire arrays doesn't work as you might expect; you can't copy the contents of an array foo into another array bar with the expression bar = foo. You need to make a loop and copy over each element, one at a time.

  • C arrays are stored in "row/column" order. That means that in a two-dimensional array, the first index is usually considered the row, while the second is the column (i.e. foo[row][col]). This is confusing to some people who are used to Cartesian coordinates, which are usually expressed in "column/row" order.

  • When a function is called with an array as an argument, a reference to the array is passed, not the entire array. This means that if the function being called (the callee) changes the contents of the array in any way, then the contents of the array are really changed. Note that this is quite different from the way that other variables are handled!

    Consider the following code:

    	void		diddle_array (int array [4])
    	{
    		array [0] = 22;
    	}
    
    	void		diddle_int (int x)
    	{
    		x = 22;
    	}
    
    	void		main (int argc, char **argv)
    	{
    		int		a	= 10;
    		int		b [4]	= { 0, 1, 2, 3 };
    
    		diddle_int (a);
    		diddle_array (b);
    		printf ("%d %d\n", a, b [0]);
    	}
    	
    What will be printed by this program is "10 22". The call to diddle_int doesn't change a at all, but the call to diddle_array does change the contents of b.

Note that the chapter in Kernighan and Ritchie entitled Pointers and Arrays is very confusing to most people (including people who know arrays and pointers forwards and backwards). Avoid this chapter until you are comfortable with pointers.


Operator miscellany

Multiple assignments

The assignment operator, =, is a binary operator in the sense that it has two operands which is operates on, and it produces a value which can be used as an expression itself. Unlike the other binary operators we've seen, assignment works right-to-left instead of left-to-right. For example:

	a = b = c = d;
The value of d is assigned to c, and then this value is assigned to b, and so on.

Note that not all expressions are allowed on the left side of an assignment! For example, 55 is a perfectly good expression, but 55 = a is not; C will not allow you to assign a new value to 55. The kind of expressions that are allowed on the left side of an assignment are called lvalues (where the l stands for left). So far, the only kind of lvalue that we know about are variables, but soon we'll meet another in lecture (and we'll see a few more as the semester progresses).

Type casting

C provides an operator to force a conversion from one type to another, called a type cast (or simply a cast). We saw some of these in the early lectures. The form of a cast is simply the name of the type that you want to convert to, surrounded by parenthesis. For example,
	a = b + (double) c;
	d = (int) (e + f);
In the first example, c is cast to type double before it is added to b. Note that this forces C to do this addition using real arithmetic, rather than integer arithmetic.

In the second example, the sum (e + f) is converted to type int before it is assigned to d.

Note that C doesn't have the ability to cast strings to integers or doubles (although there are library functions to do this). The following will compile (for reasons that should become clear after we study pointers) but not do what you want at all:

	{
		int	foo;
		string	bar;

		foo = (int) "123";
		bar = (string) 456;
	}

To convert back and forth from strings and other types, other methods are necessary.


Macros

The #define directive defines a macro. The compiler the substitutes the definition of the macro wherever it sees the macro appear in your program. For example,

	#define	TIMES_TO_LOOP	4

	for (i = 0; i < TIMES_TO_LOOP; i++) {
		printf ("%d\n", i);
	}
In this example, 4 will be substituted in wherever TIMES_TO_LOOP appeared in the original code.

Macros can also behave a little bit like functions, taking arguments and substituting them as well:

    #define PRINT_TWICE(str)	printf ("%s %s", str, str)
    #define MAX(x, y)		((x > y) ? x : y)
    #define MIN(x, y)		((x < y) ? x : y)

Other General Concepts


Creating new types

typedef

typedef is used to create a new type, which can then be used by your program as the type of variables.

For example, in life.h the following line appears:

	typedef	char	board_t [MAX_ROWS][MAX_COLS];
This creates a new type, named board_t, which is a two-dimensional array (MAX_ROWS by MAX_COLS in size) of chars.

Once you've created a new type with typedef, you can use this type anywhere you would use an ordinary type. For example, you could then have in your main function a variable foo of type board_t:

void	main ()
{
	board_t		foo;

	...

enum

One of the things that you can typedef are enumerated types. An enumerated type is closely related to #define'd integer constants, but are a more sophisticated construct.

An enum is yet another type, but with a twist: when you define an enum variable, you also define all the values that you want that variable to be able to hold. Each value is defined symbolically; you tell the compiler what name you want to use for each value, and the compiler chooses actual numbers that it will use to represent those values.

For example, the following enum definition creates an enum named directions that can take on the values UP, DOWN, LEFT, or RIGHT:

	enum direction_t { UP, DOWN, LEFT, RIGHT };
Once this definition is made we can define variables of this new type:

	enum direction_t which_way;

Of course, it's a little clumsy to put an enum in front of each of these, so 99% of all the enums you'll see in contemporary code are actually part of typedef's. For example:

	typedef enum {
		UP, DOWN, LEFT, RIGHT
	} direction_t;

Once this typedef is done, direction_t can be used in the same way as any other type.

Note that if you don't want the compiler to choose the values on your behalf, you can define them yourself. For example, imagine that you wanted to do the same typedef, but wanted to have specific values for UP, DOWN, LEFT, and RIGHT. (This might happen if the direction_t type will be used to interface with some piece of unenlightened code that uses some mysterious constants to represent directions.)

		/* truly bizarre values...	*/
	typedef enum {
		UP	= 1,
		DOWN	= 10,
		LEFT	= 100,
		RIGHT	= 1000
	} direction_t;

This is similar in nature to #define'ing these values, but setting them up as part of an enum gives the compiler another chance to detect any misuse (for example, trying to evaluate UP + UP makes perfect sense to the compiler if UP is merely a number, but if UP is an enum'd value, most compilers will complain that this doesn't make much sense.


Memory and Pointers

Pointers and address are synonymous.

Although people sometimes say pointer (usually when talking about C), and other times say address (usually when talking about assembly), both refer to the same idea.

The C Syntax for Pointers

A few new C operators:

&

``take the address of'' or ``get a pointer to'' an object.

*

``dereference an address'' or ``follow the pointer'' (i.e. fetch the value located at the given address).

The * character can also appear as part of the type specification in a variable declaration, to indicate that the variable is a pointer to the given type. Be careful not to confuse these two uses! For example:

{
	int	foo;	/* foo is an integer variable.			*/
	int	*bar;	/* bar is a pointer to an integer.		*/
	int	**qux;	/* qux is a pointer to a pointer to an integer.	*/
	int	***baz;	/* baz is a ptr to a ptr to a ptr to an int.	*/

	foo	= 1492;	/* set foo to contain 1492.			*/
	bar	= &foo;	/* set bar to point to the location of foo.	*/
	qux	= &bar;	/* set qux to point to the location of bar.	*/
	baz	= &qux;	/* set baz to point to the location of qux.	*/

			/* After these statements:			*/
			/*  foo == 1492.				*/
			/* *bar == 1492.				*/
			/* *qux = bar, **qux == 1492.			*/
			/* *baz = qux, **baz = bar, ***baz == 1492.	*/
}

Call by Value vs. Call by Reference

In general, C functions use a call by value for passing their arguments: the value of expressions used as arguments to functions are passed to the functions.

For example:

	void		foo (int x)
	{
		x = 5;
		printf ("x inside foo = %d\n", x);
		return ;
	}

	void		main ()
	{
		int		x	= 10;

		printf ("x inside main = %d\n", x);
		foo (x);
		printf ("x inside main = %d\n", x);
		return ;
	}
In this example, when main calls foo, the value of x, which is 10, is passed to foo, and winds up in foo's own copy of x. If foo changes the value of its x, the value of x inside main is unchanged.

When this program is run, it will produce the output:

	x inside main = 10
	x inside foo = 5
	x inside main = 10
However, pointers allow you to sidestep call by value. Instead of passing a copy of the value to the function, you can pass a reference (or pointer) to the variable to the function. The function can then dereference the pointer, and change the variable as it pleases. This way of passing parameters is called call by reference. For example:

		/* foo takes a single argument, which is a	*/
		/* pointer (or reference) to an integer.	*/
	void		foo (int *x)
	{
		*x = 5;
		printf ("x inside foo = %d\n", *x);
		return ;
	}

	void		main ()
	{
		int		x	= 10;

		printf ("x inside main = %d\n", x);
			/* pass foo a pointer to x.		*/
		foo (&x);
		printf ("x inside main = %d\n", x);
		return ;
	}
This program will produce the following output:

	x inside main = 10
	x inside foo = 5
	x inside main = 5

malloc and friends

malloc

Ask UNIX for a new chunk of memory. The chunk initially contains undefined (random) values.

free

Give a chunk of memory back to UNIX.

Note- only free memory that was allocated with malloc (or one of the related functions). Do not free other memory!

sizeof

A C operator that returns the number of bytes in an variable or type.

Note- sizeof does not work properly with strings. If you want to find the number of bytes used by a string, you must use StringLength (from Roberts) or strlen (from the standard C library) or a function of your own devising.

NULL

The canonical ``invalid'' or ``empty'' pointer is called NULL. A value of NULL is used to indicate that the pointer doesn't point anywhere valid.

If you call a function that is supposed to return a pointer, and the pointer it gives you is NULL, it generally means that something has gone wrong.

void pointers

A void pointer is a pointer to anything. Unlike void, (which is C's way to say ``nothing'') a void pointer can point at anything. Before it can be dereferenced, however, it must be cast to be a pointer of the appropriate type.

Pointer problems

One misconception that still haunts many people: when you define a pointer, this does not allocate space for it to point to! For example:

	int		*i;
This defines i to be a pointer to an integer. It does not specify what integer i points to, nor has it allocated memory somewhere for i to point to.

You must never dereference a pointer variable unless you have assigned a valid address to that pointer.

Pointer arithmetic: arrays

In C, there are strong relationships between pointers and arrays. For any one-dimensional array A and integer expression x, the following relationships are always true:

&A[0] == A

As far as C is concerned, A is just the address of the first element of the array A itself.

&A[x] == A + x

This is one of the great truths of the C programming language, and a defining characteristic of the language.

This equality defines the nature of pointer arithmetic, which defines how the + (aka addition) operator works with pointers.

When an integer x is added to the a pointer A, the result is a pointer that points to the memory location of the x'th thing of whatever type A points to.

For example, if A is a pointer to an int, then A + 5 points to the memory address that is "five ints after" A. On the HP machines, where sizeof(int) is 4, this is the same as adding 20 to the value of A.

Note that the first relation (&A[0] == A) can be seen as a special case of this condition.

A[x] == *(A + x)

Looking closely at this relationship and bringing your powerful knowledge of pointers to bear, you should notice that this relationship follows immediately from the previous relationship-- we're just deref'ing both sides of the equality, so that instead of dealing with pointers we're dealing with the things being pointed at.

We can use these relationships in either direction-- we can treat arrays like pointers (useful in some situations), and we can treat pointers like arrays (very useful in many situations, including some you will face in assignment 6).

For example, consider the following code:

double		*array_of_doubles;

array_of_doubles = (double *) malloc (20 * sizeof (double));

In this code snippet, we've declared a variable named array_of_doubles to be a pointer to a double, and then allocated (via malloc) enough space to hold 20 consecutive doubles, and assigned the resulting pointer to array_of_doubles.

Now, thanks to the properties of C stated earlier and the fact that array_of_doubles points to enough space to hold 20 doubles, we can treat array_of_doubles as if it had been defined to be an array of 20 doubles all along. Even better, we can decide at runtime whether the array should be 20, 200, or 2,000,000 million doubles in length (while there is no way to dynamically chose the size of arrays). Best of all, when we're finished with this memory, we can give it back to the system (via free).

Pointers allow us to write code that uses exactly as much memory as we need, when we need it (and then give it back when we don't need it any longer) while still allowing us to write code using the notational simplicity of arrays. Splendid!

Also note that we can have arrays of pointers:

double		**double_pointers;

double_pointers = (double **) malloc (16 * sizeof (double *));

This is as if we had created an array of 16 pointers to doubles.

Since we know that a string (from the Roberts libraries) is nothing more than a char *, we can create an array of 10 strings in a similar manner:

char		**strings;

strings = (char **) malloc (10 * sizeof (char *));

A few remarks about style

Now, when I write "completely equivalent", I mean that they look exactly the same to the compiler. They certainly don't look the same to you, and 99.9% of the population finds the array notation much more readable and stylistically superior. This gets to be even more true with arrays of more than one dimension. Imagine if in your life program, instead of writing

	next [row][col] = BEAST;
you had to write

	*(*(next + row) + col) = BEAST;
The result would not be gentle upon the eye, and becomes even worse for arrays of higher numbers of dimensions.

Let's pry this expression apart, however, just to make sure that we understand what's happening. Working from the inside out, according to C's rules for the order of evaluation:

next

next is the start of the array. The type of next is, in pointer notation, an array of pointers, each of which points to a row of the array.

next + row

Add row to next, to get a pointer to the start of the given row. This pointer is not the type of pointer that we want, however; it's effectively a 2-d array (since next is a 2-d array). What we really want is to do is deal with just a row of this array.

*(next + row)

Dereference this pointer, to get the actual pointer to the start of the row. If the type of next is a two-dimensioned array of chars, this pointer points to a one-dimension array of chars. In essence, this deref is more like a cast than anything else.

*(next + row) + col

Add row to this pointer, to get a pointer to the exact location of the element we want to reference.

*(*(next + row) + col)

Dereference this final pointer.

If this seems confusing, you're not alone. This is painful.

Pointer arithmetic: incrementing

Just as it is possible to compute the address of items in arrays by using address arithmetic instead of the usual array notation, it's also possible to fiddle with pointers in more general ways. For example, imagine that you have a string and want to return a pointer to the first 'a' in the string (for the sake of argument, we'll assume that there will always be an 'a'). Using array notation, and taking advantage of the fact that a string is really a char *, we could write this as:

	int		i;

	for (i = 0; string [i] != 'a'; i++)
		;

		/* the answer is &string [i].	*/

Using address arithmetic, however, we can discard the index i and replace it with a pointer:

	char		*p;

	for (p = string; *p != 'a'; p++)
		;

		/* the answer is p.		*/
In this case the pointer notation is more compact than the array notation.

For lots of other examples of pointer arithmetic and its appropriate uses, see K+R (especially) or the Roberts text.


Structures

Structures provide a way to lump several variables (with different names) into a single box (with its own name as well). There are many good examples of structures in the lecture notes, Roberts, and K+R.

Defining structures

To define a struct, use the struct type. In form, it is just like the enum type that we saw earlier. For example:

	struct	my_struct	{
		int		foo;
		double		bar;
		string		qux;
	};
This defines a new type of structure, named my_struct, which contains three fields: an integer named foo, a double named bar, and a string named qux.

Like an enum, this new type of structure can only be referenced in conjunction with the struct. For example, if we wanted to declare a variable named x of this type, we would need to use the following code:

	struct	my_struct	x;
Since this it is ugly to carry around the struct everywhere (it's much nicer to treat this new structure type in the same manner as we treat every other type), structure types are almost always typedef'd in contemporary C code (just like enum's). (However, use of struct is unavoidable with recursive types, as noted in a later section of this handout.) For example:

	typedef	struct	{
		int		foo;
		double		bar;
		string		qux;
	} my_struct;
Now that this new type has been created via typedef, we can use it just like any other type. For example, to create a variable named y of this type:

	my_struct	y;
However, there is a lot of older C code still floating around (such as the standard C UNIX libraries) so you will see structs wandering around on their own from time to time, not wrapped up in the shelter of a typedef.

Accessing structure fields

The basic way operator for accessing structure elements is the "dot" operator, written ".". For example, given the previous definition of y, we can access the value of the foo field of y by the expression y.foo. For example:

	int		integer;

	y.foo = 12;		/* assign to y's foo.	*/
	integer = y.foo;	/* get value of y's foo.*/

A common way to refer to structures is via pointers. To simplify the process of accessing the fields of structures via pointers, C includes a special "deref" operator, written "->".

Imagine that we had a variable named z which is a pointer to a my_struct:

		/* initialize z to point to y.	*/
	my_struct	*z	= &y;

		/* to access the foo field of	*/
		/* the struct z points to:	*/
	z->foo = 12;
Of course, you could dereference z by using * and then use . to get at foo, but because of the precedence rules of C, it would be necessary to parenthesize the expression. For example:

	(*z).foo = 12;
Note: the expression *z.foo generally has a completely (and important) meaning (although in for our given definition, it is an error). If z is a structure (instead of a pointer to a structure), and the foo field is a pointer, then *z.foo dereferences this pointer.

The (*). notation quickly becomes tedious for complex operations, so the -> notation is the one that you'll see nearly anywhere pointers to structures are used.

Recursive Structures

One thing that confuses people is the defining recursive or self-referential structures (which are structures that contain a pointer to a structure of their own type). For example, for the singly linked list structure used in the lecture notes, we have the following structure:

	typedef	struct	_sllist_t {
		struct	_sllist_t	*next;
		void			*data;
	} sllist_t;
The next field in this structure is a pointer to another sllist_t, which represents the next element in the linked list.

Note that in the type specification of the next field, what we really would like to write is just sllist_t *next. This is impossible, however, because sllist_t hasn't been defined yet- it isn't actually defined until the very last line of the typedef has been processed by the compiler. Therefore, the way to express this is to use struct _sllist_t *next to tell the compiler that the field is a pointer to an struct _sllist_t (which the compiler knows about, since it is in the midst of being defined).

Note that a structure cannot contain a field of its own type (although it can contain pointers to structures of its same type). A truly recursive structure that could contain itself would have a potentially infinite definition, something our computers cannot quite grasp.

Other struct stuff

Structs really can be treated like any other C type. You can pass them to functions, and functions can return them. (This is actually somewhat rare, but you can do it if you like.) You can declare arrays of them, of any dimension you like, and so on.


Command line arguments

Command line arguments are important to ``real'' programs. There is some coverage of this in the lecture notes and the textbook.

Note the use of the getopt function (which is shown in the lecture notes). getopt makes handling command line arguments relatively easy, and may be useful for your final projects.


Bit manipulation in C

C is one of the relatively few higher-level programming languages that enable the programmer to explicitly manipulate bits (in fact, some people argue that C is not a high-level language at all, simply because it provides this facility).

It turns out that the ability to fiddle with individual bits can be a very handy feature, especially when designing data structures that use a minimal amount of memory. It also makes dealing with UNIX system calls much easier, since many of them have arguments that actually are created by mashing together patterns of bits (see the man pages for open(2) or chmod(2)

|
Bitwise or.

&
Bitwise and.

^
Bitwise xor (exclusive or).

~
Bitwise not (bit negation).

<<
Shift left- slide the bits ``up'' by a specified number of positions.

>>
Shift right- slide the bits ``down'' by a specified number of positions.

According to K+R, what gets shifted into the top bits during a right shift is compiler-dependent and may be either 0 or the previous top bit (but not random bits!) if being shifted is of a ``signed'' type (i.e. int short etc). If the thing is unsigned then the bits shifted in are always 0.

For example, if operating on 8-bit chars:

  • (1 | 2) == 3
  • (1 | 3) == 3
  • (1 & 2) == 0
  • (1 & 3) == 1
  • (2 & 3) == 2
  • (0 ^ 3) == 3
  • (1 ^ 3) == 2
  • (2 ^ 3) == 1
  • (3 ^ 3) == 0
  • ~0 == -1 (signed) or 255 (unsigned).
  • ~23 == -24 (signed) or 232 (unsigned).
All of the operators (except the unary operator ~) can also be used in conjunction with the assignment operator, i.e.

  • a |= b is the same as a = a | b.
  • a &= b is the same as a = a & b.
  • a ^= b is the same as a = a ^ b.
  • a <<= b is the same as a = a << b.
  • a >>= b is the same as a = a >> b.

Examples

  • To set the ith bit of an integer x, we need to do two things:

    1. Create a bit mask with just the ith bit set. We can do this by sliding a 1 over i places, using the << operator:

      			1 << i
      		

    2. Set x to the or of this bit mask and x itself. We can do this with the | operator:

      			x = (1 << i) | x;
      		

      Since the | operator also has a |= form, this can be expressed more succinctly as:

      			x |= (1 << i);
      		

  • Testing whether or not the ith bit of an integer x is set is very similar:

    1. Create a bit mask with just the ith bit set. We can do this by sliding a 1 over i places, using the << operator:

      			1 << i
      		

    2. Take the bitwise & of x and this bit mask, which will return the bit mask if the ith bit of x is set, or 0 otherwise:

      			x & (1 << i)
      		

      Since in C anything non-zero is considered ``true'', this expression can be used as a boolean expression (i.e. as the argument to an if).

  • Clearing the ith bit of an integer x is slightly more complicated:

    1. Once again, create a bit mask with just the ith bit set.

    2. Take the bitwise negation of this bit mask, with the ~ operator.

      			~(1 << i)
      		

    3. Set x to the and of this new bit mask and x itself:

      			x = ~(1 << i) & x;
      		

      or

      			x &= ~(1 << i);
      		


C style notes

  • Style.

    We already talked about this in lecture and section (very briefly). Here's a little more on this subject:

    Not everyone agrees what the best style is, but everyone agrees that good styles share several properties:

    Use of whitespace
    Whitespace (blank lines, spaces, tabs, etc) can be used to visually group related parts of a program. For example, all the statements in a block are usually indented to the same level.

    Consistency
    In this case, consistency means that two pieces of code that do nearly the same thing should look very similar. For example, if you format all your if-else statements one way (unless there is a good reason not to).

    Commenting
    Comments should be visually distinct from the code, so the two don't get tangled up.

    Since the compiler doesn't care where you put extra blank lines, spaces, or comments, you have a considerable amount of freedom about how your program looks. If you use this freedom wisely, you will have programs that are much more readable, understandable, and maintainable (and gradable) than if you don't.

    To start with, you are probably best off trying to emulate one of the styles you've seen; from the course book, Professor Seltzer's slides, the assignment code itself, or K+R's.

  • Functional Decomposition.

    You should break your programs down into subfunctions wherever useful. Some rules of thumb about when to break a function into subfunctions, in rough order of importance:

    1. If it will make your program easier to read or modify.

    2. If it will make it easier to reuse bits of this program in other programs.

    3. If it will your program simpler and more elegant.

    C Style Redux

    • Restrict the scope of variable declarations- avoid unnecessary global variables.

    • Group similar or related functions together.

    • Give your functions and variables reasonable names. For example, I generally give my functions names that start with the type of data they handle, followed by what they do to that data. For example, a function that prints the contents of an set might be called set_print.

    • Header files:

      • One for the global variables and functions. This header file is included in all source files that access these variables and functions.

      • One for each source file or submodule (group of source files). For small programs, these are often combined with the global header file.

      • Variable declarations in header files should be declared extern.

      • Header files are an excellent place to provide some documentation for global variables and functions: the reader can get a quick idea of what a module does by reading the header.

      • Never put executable code (i.e. function or variable definitions) in header files.

    • Source files:

      • Module documentation.

      • Include header files.

      • Local (private) macros, types, and variables. Local variables should be declared static.

      • Function definitions.

      • Don't forget documentation!


    Notes for Pascal Programmers

    This section is mostly aimed towards people who already know how to program in Pascal. If you don't know Pascal, this section is irrelevant.

    If you've previously programmed in Pascal, here are a few differences between functions in C and Pascal:

    • To call a function in C, even if it has no arguments, you must include the ()'s. A statement like:

      	foo = GetInteger;
      	
      is legal in C (provided that foo is declared properly), but doesn't do what you might expect.

    • In C, functions can't be defined inside other functions.

    • C functions of return type void are like Pascal procedures, while functions of a non-void return type are like Pascal functions.

    • Use return () to return a value from a function. As soon as you execute a return statement, the function immediately returns; no more statements in the function are executed.


    Please bring any errors or inconsistencies in this document to the attention of the author.

    Dan Ellard - (ellard@das.harvard.edu)