CS50 - Section 6

Section Information


Memory and Pointers

Pointer and address are synonymous. A pointer is an address.

Although people sometimes say pointer (usually when talking about C), and other times say address (usually when talking about assembly or architectural issues), both refer to the same idea.

The C Syntax for Pointers

A few new C operators:

&

``take the address of'' or ``get a pointer to'' an object.

*

``dereference an address'' or ``follow the pointer'' (i.e. fetch the value located at the given address).

The * character can also appear as part of the type specification in a variable declaration, to indicate that the variable is a pointer to the given type. Be careful not to confuse these two uses! For example:

{
	int	foo;	/* foo is an integer variable.			*/
	int	*bar;	/* bar is a pointer to an integer.		*/
	int	**qux;	/* qux is a pointer to a pointer to an integer.	*/
	int	***baz;	/* baz is a ptr to a ptr to a ptr to an int.	*/

	foo	= 1492;	/* set foo to contain 1492.			*/
	bar	= &foo;	/* set bar to point to the location of foo.	*/
	qux	= &bar;	/* set qux to point to the location of bar.	*/
	baz	= &qux;	/* set baz to point to the location of qux.	*/

			/* After these statements:			*/
			/*  foo == 1492.				*/
			/* *bar == 1492.				*/
			/* *qux = bar, **qux == 1492.			*/
			/* *baz = qux, **baz = bar, ***baz == 1492.	*/
}

Call by Value vs. Call by Reference

In general, C functions use a call by value for passing their arguments: the value of expressions used as arguments to functions are passed to the functions.

For example:

	void		foo (int x)
	{
		x = 5;
		printf ("x inside foo = %d\n", x);
		return ;
	}

	void		main ()
	{
		int		x	= 10;

		printf ("x inside main = %d\n", x);
		foo (x);
		printf ("x inside main = %d\n", x);
		return ;
	}
In this example, when main calls foo, the value of x, which is 10, is passed to foo, and winds up in foo's own copy of x. If foo changes the value of its x, the value of x inside main is unchanged.

When this program is run, it will produce the output:

	x inside main = 10
	x inside foo = 5
	x inside main = 10
However, pointers allow you to sidestep call by value. Instead of passing a copy of the value to the function, you can pass a reference (or pointer) to the variable to the function. The function can then dereference the pointer, and change the variable as it pleases. This way of passing parameters is called call by reference. For example:

		/* foo takes a single argument, which is a	*/
		/* pointer (or reference) to an integer.	*/
	void		foo (int *x)
	{
		*x = 5;
		printf ("x inside foo = %d\n", *x);
		return ;
	}

	void		main ()
	{
		int		x	= 10;

		printf ("x inside main = %d\n", x);
			/* pass foo a pointer to x.		*/
		foo (&x);
		printf ("x inside main = %d\n", x);
		return ;
	}
This program will produce the following output:

	x inside main = 10
	x inside foo = 5
	x inside main = 5

malloc and friends

malloc

Ask UNIX for a new chunk of memory. The chunk initially contains undefined (random) values.

free

Give a chunk of memory back to UNIX.

Note- only free memory that was allocated with malloc (or one of the related functions). Do not free other memory!

sizeof

A C operator that returns the number of bytes in an variable or type.

Note- sizeof does not work properly with strings. If you want to find the number of bytes used by a string, you must do it different way (which we'll probably talk about next week).

NULL

The canonical ``invalid'' or ``empty'' pointer is called NULL. A value of NULL is used to indicate that the pointer doesn't point anywhere valid.

If you call a function that is supposed to return a pointer, and the pointer it gives you is NULL, it generally means that something has gone wrong.

void pointers

A void pointer is a pointer to anything. Unlike void, (which is C's way to say ``nothing'') a void pointer is a pointer that can point at anything. Before it can be dereferenced, however, it must be cast to be a pointer of the appropriate type.


Structures

Structures provide a way to lump several variables (with different names) into a single box (with its own name as well). (There are lots of good examples in the lecture notes.)

Recursive Structures

One thing that confuses people is the defining recursive structures (which are structures that contain a pointer to another structure of their same type). For example, for singly linked lists, we have the following structure:

	typedef	struct	_sllist_t {
		struct	_sllist_t	*next;
		int			data;
	} sllist_t;
On the second line of the typedef, what we really want to write is just sllist_t *next, but unfortunately we can't, because sllist_t isn't actually defined until the very last line of the typedef has been read. Therefore, the best we can do is say struct _sllist_t *next to tell the compiler that the field is a pointer to an _sllist_t structure (which it knows it is in the midst of defining).

Note that a structure cannot contain itself (although it can contain pointers to structures of its same type). A structure that contained itself would have an infinite definition, something our computers cannot quite grasp.


Assignment 5

postnet

For this part of the assignment, you will write a complete program that reads in ZIP codes in POSTNET format (or sequences of characters masquerading as ZIP codes) and determines whether or not they represent valid codes, and prints the codes if they are.

The way that we'll represent POSTNET codes is via a sequence of barcodes. Because we don't have barcode readers on our computers (at least for the most part), we'll use the letter '-' (a dash) to represent a "half bar" or "short bar", and the letter 'x' (a lowercase x) to represent a "full bar" or "tall bar".

ZIP codes are represented, in POSTNET format, as a sequence of exactly 32 bars. See the assignment handout for the specific rules for what sequences of codes are valid and what they represent.

Note that not every sequence of characters represents a valid area code encoding. A big part of your task here is to write your program in such a way that will recognize all invalid sequences of bars and reject them as such. The actual conversion from POSTNET to decimal is relatively easy, given a valid POSTNET code- the main focus of this assignment is on figuring out how to check whether or not each code is valid. No sequence of faulty characters should fool your program (or worse, cause it to crash). Your code should be bulletproof!

In order to check for each of a range of possible errors, you may find it useful to subdivide your program into small, modular functions, each responsible for checking one particular aspect of the input. Make sure that each function performs its own well-defined but small task, and then combine those in appropriate ways to accomplish the full task.

Notice that the POSTNET code is not exactly the binary code (two's complement) that we have seen before (for example, in the reading). Its different encoding scheme is designed to make it somewhat more robust.

You can use sol_postnet as an example of how your program should work.

bigadd

In this exercise, you should acquire a feel for the way in which information is stored in memory, and how you can use existing data representations that the computer knows how to deal with (i.e. integers) to build new data representations (i.e. really large integers). You are asked to implement some algorithms that use arrays to represent large integers. We have provided all of the necessary data structures, and some useful functions that use them, but you get to implementing a few functions that use them to. Note that you should familiarize yourself with the code that we give you before you start to design (or write) your own code. The most important routine you will write is a function to add two of these large numbers. Since each number is represented by an array, with each element in the array representing a digit, a natural algorithm for addition is the one that you learned in grade school: add the first two digits, carry, add the second two digits, carry, and so on. If you think about this for a moment you should see how to write this very succinctly in C.

Note that if the sum of two numbers you add is too large to fit into a bignum_t, your bignum_add function must note this in its return value. In general, pay attention to what each function is supposed to return, in addition to what it's supposed to compute! This is an important part of this assignment.

You can use sol_bigadd (or the UNIX command bc) to check your program.

Also note that this program is spread out over two files: bigadd.c (which contains main, and which you should not modify at all), bigadd.c (which contains a bunch of routines that deal with bignum_ts), and finally bignum.h (which contains the definitions and declarations used by bignum.c). Congratulations are in order; you're writing your first library! The way that the code has been divided makes it easy to reuse the bignum code in other programs (should you feel the urge). We'll be doing more and more of this from now on; with few exceptions, every program you'll write from this point forward will have its source spread over several files.


Computer Architecture

By this point in the course, you have probably started to form some suspicions about what's really going on inside the computer, and how the high-level statements you're allowed to use in C are actually executed. However, you may be frustrated at how briefly we paused to discuss such details. Don't worry; we'll be back.

The assignment this week focuses mostly on data representation- using C types to represent various other types of data, including very large numbers and POSTNET codes. In this sense, this is very closely related to the topic of computer architecture, where many of the most important techniques revolve around representation.

Computer components

There are three main computer components, from an architectural point of view:

CPU

The CPU (or Central Processing Unit) is the "brain" of the computer. It contains the mechanisms that actually perform the computations that the computer does on your behalf.

Memory

The memory (also called RAM, which stands for Random Access Memory) is the scratch paper that the CPU uses to store data and partial results from computations that are in progress.

Note that the CPU often contains some small amount of scratch memory as well. This memory is usually called registers. In most architectures, the registers are treated quite differently than other memory, but a few architectures treat them in similar ways.

I/O

I/O (which stands for Input/Output) is the mechanism that the CPU uses to input data, either directly to the CPU or into memory, from file, keyboard, network, or other external devices, to output data to files, the screen, network, or other external devices.

It may seem odd for contemporary users to refer to a file as an external device, since for computers like PCs and Macs, files are stored on disks that generally reside inside the computer. Similarly, the screen and keyboard are also tightly integrated with most contemporary computers, so that it's difficult to imagine a computer without a keyboard and screen (and mouse and disk...). What is really meant by input and output is with respect to the CPU and RAM and other devices in the system- not whether the data actually enters or exits the box that computer resides in.

The CPU

In general, a CPU contains the following pieces:

Register File

The register file is a small set of memory locations that can each be used to store a single item (usually a character, integer, or a floating point number).

Control

The control logic decides what operation in the program to perform next.

Each program, by the time the CPU gets it, is broken down into a sequence of operations. By default, the control logic is quite simple- execute the operations in the sequence that appear in the program. However, the control logic can also execute the operations in a sequence that is computed as the part of the program itself- for example, with the C if statement.

Datapath

The datapath is where the action actually takes place. Data it taken from a register or registers, operated upon, and the results are placed back into a register (or registers).

One component of the datapath is the ALU (the Arithmetic Logic Unit), which is the brain of the CPU (just as the CPU is the brain of the computer). The ALU is responsible for computing the mathematical functions that actually implement the arithmetic and logic operations that the CPU executes. It may seem quite surprising that all of the operations that the CPU can compute reduce to a small set of logical and arithmetic functions.

Memory Organization


Please bring any errors or inconsistencies in this document to the attention of the author.

Dan Ellard <ellard@deas.harvard.edu>