CS50 - Section 5

Section Information

Random numbers


Deterministic and Nondeterministic

A function (or process) is deterministic if, given a set of inputs (or an initial state) the output (or subsequent state) is unique and completely determined. If a function (or process) is nondeterministic, then given a set of inputs (or an initial state), there is a fixed set of possible outputs (or subsequent states), any of which are correct. For example, when you flip a coin, the output is either a head or tail; either answer is correct.

These terms have more precise and complete definitions in theoretical computer science, but this should hold us for now.

Random and Pseudo-random

A random number generator is a function that produces truly random numbers. A true random number generator is nondeterministic.

Unfortunately, it is impossible to write a nondeterministic random number generator using a deterministic computer (like the computers we usually use), so we have to settle for pseudo-random number generators- functions that behave in ways that are convincingly random (at least in the short term), but are actually completely deterministic.

Seeding the Random Number Generator - srand

Make sure you understand what srand (and Randomize) do, and why this is important.

You can think of rand() as somehow containing a very long list of numbers, and every time you call rand() it just gives you the next number. srand() seeds the random number generator, which is analogous to telling rand() where to start reading in the list. This list is always the same, so if you give srand() the same seed again, rand() will start in the same place again. If you don't use srand() (or Randomize(), or some other function that calls srand()), then rand() always starts at the ``beginning'' of the list, and you will always get the same sequence of random numbers.

Generally, it is only appropriate to call srand (or Randomize()) once per program- there's no need to call it again, once you've selected a seed.

One way to generate a seed that will change from run to run is to use the current time as the seed. (This method is described in the lecture notes.) You might also want a way to let the user set the seed to an explicit value, so that they can get the same random numbers whenever they want, or pick a seed some other way (so they're not always starting at the same place, unless they choose to).

Generating Random Integers: NSTT

The rand() function returns integers in the range from 0 to RAND_MAX (where RAND_MAX is the largest number rand() can ever produce). This is frequently not what you want, however. Imagine that instead you want to generate a random integer in the range between low and high, inclusive. The following algorithm (called "NSTT", for obvious reasons) can be used:

  1. Normalize: get an integer d from rand, and normalize it (so that it falls in the range 0.0 to 1.0).

  2. Scale: multiply the normalized random number by the size of range (high - low + 1) of numbers desired.

  3. Truncate: convert the number back to integer.

  4. Translate: add low to the truncated number, to translate it to the desired range.

This might seem like a lot of work: another approach that people use that seems easier than the first three steps is simply to use rand() to get an integer, and then take this modulo the range. This doesn't work, however, because rand() behaves very badly in small moduli-- in fact, many implementations of rand() always cycle between odd and even numbers, so using (rand () % 2) to simulate flipping a coin will give very predictable (and entirely useless) results!

Generating Random Reals: NST

The same basic steps, except that no truncation is necessary, and the scaling is slightly different:

  1. Normalize: get an integer d from rand, and normalize it (so that it falls in the range 0.0 to 1.0).

  2. Scale: multiply the normalized random number by the size of range (high - low) of numbers desired.

  3. Translate: add low to the truncated number, to translate it to the desired range.

The Monte Carlo Technique

If know any probability or statistics, then the problems posed by this assignment may seem pretty easy- in fact, you may be able to work their solution with pencil and paper in a few minutes (especially for keno!).

It may seem silly then to use the Monte Carlo technique to come up with approximate solutions to problems that people know how to compute exact solutions for- and in some cases, it is. However, the Monte Carlo technique is extremely useful for problems which do not have analytical solutions (or that you don't happened to know an analytical solution for). Even better, it turns out that for a wide variety of otherwise unsolvable problems, the Monte Carlo technique can be shown to be the most efficient way possible to achieve an accurate approximation. (A proof of this is somewhat beyond the scope of this course, but feel free to ask.) The important point is that the Monte Carlo technique and its variations are very useful things to have in your bag of tricks.

So, let's think for a moment about how this technique actually works. Imagine that you had a fair coin, and you flipped it a large number of times. Knowing that it is a fair coin, you expect that it will come up heads about half of the time.

However, this expectation also works in the other direction: if we flip a coin a large number of times, and it comes up heads about half the time, then we soon start to expect that it is a fair coin! Similarly, if we flip it a large number of times and it only comes up heads 1/4 of the time, then we will soon start to think that the coin is unfair, because the likelihood of seeing only 1/4 heads is much higher if the coin is unfair than if the coin is fair. Just as we expect a certain distribution of heads and tails for a coin whose "fairness" we know, given a certain distribution of heads and tails, we can start to form an estimate of the fairness of the coin.

Of course, we need to flip the coin a large number before we can start to have a lot of confidence in our estimate. For example, if you flip the coin once, it will either be heads or tails, and your estimate based on the distribution so far will be that the coin is absolutely unfair in one direction or the other! However, after many coin flips, the distribution of actual events will, in all likelihood, give us a pretty good idea what the true bias on the coin is.

Assignment 4

For this assignment, you need to write two short programs, think about some related problems, and figure out some mystery code. Don't worry; these programs should both be pretty short and simple once you understand what's going on, and they're probably somewhat easier than hi-q.

The sine program

This should be reasonably straightforward once you've read the lecture notes from this week and chapter 8 of the Roberts book. The lecture notes, especially the monte.c program, should give you a very good idea of what you need to do. If you understand monte.c, you will understand how to write sine.c.

The keno program

Think about this program before you start to write it. There are some approaches that lead to dead ends (or at least clumsiness), but if you plan ahead, everything should be relatively smooth sailing.

Type Conversions and Casing

In C, we have several fixed data types. Often it happens that we need to convert between two or more different data types. In order to do this, we can use type conversions or type casting. Some conversions must be done by hand, via casting. Others type conversions are done automatically by the compiler.

For example, an arithmetic expression can range over several different data types, such as ints and floats. In order to make sense out of such expressions, the compiler performs implicit type conversions - it converts integers to floats before performing the computation. Explicit conversions can also be performed by casting. The casting operation in C is simply the name of the type which you wish the value to be converted to, enclosed in parenthesis. For example, (int) or (double), which we have seen already in the lecture notes.

One thing to watch out for, however: the compiler, always willing to do your bidding (provided that you ask it properly, of course) will allow you do type conversions that don't make much sense. For example:

	int	i = (int) "123";
	string	s = (string) 123;

While these may make sense to you or me (the first one might mean "set i equal to the number that the string "123" represents", and the second one might mean "set s to the string that represents the number 123"), the compiler has very different ideas about what these operations do. In neither case will it do what you want, although both of these statements are perfectly valid C. (We'll discover later what it does do, and it will make sense.)

typedef and enum

typedef is used to create a new type, which can then be used by your program as the type of variables. Once you've created a new type with typedef, you can use this type anywhere you would use an ordinary type. Now that we've seen nearly all of C's built-in type, we'll spend much of rest of the semester defining new types and their behaviors, so we might as well dive head-first into the ideas of defining and using our own types.

One of the things that you can typedef are enumerated types. An enumerated type is closely related to #define'd integer constants, but are a more sophisticated and useful construct.

An enum itself is an ordinary C type, but with a twist: when you define an enum variable, you also define all the values that you want that variable to be able to hold, and give them names. Each value is defined symbolically; you tell the compiler what name you want to use for each value, and let the compiler choose what actual numbers to use to represent each name.

For example, the following enum definition creates an enum variable named directions that can take on the values UP, DOWN, LEFT, or RIGHT:

	enum { UP, DOWN, LEFT, RIGHT } directions;
Unfortunately, if we're going to declare more than one variable of this type, C will bark at us because it gets confused about which "UP" is which. Therefore, we need a better way to do this. The better way is to define a new type (using typedef) for this particular enum. For example:

	typedef enum {
	} direction_t;

Once this typedef is done, direction_t can be used in the same way as any other type. For example:

	direction_t	directions;

Note that if you don't want the compiler to choose the values on your behalf, you can define them yourself. For example, imagine that you wanted to do the same typedef, but wanted to have specific values for UP, DOWN, LEFT, and RIGHT. (This might happen if the direction_t type will be used to interface with some piece of unenlightened code that uses some mysterious constants or #define'd values to represent directions.)

		/* truly bizarre values...	*/
	typedef enum {
		UP	= 1,
		DOWN	= 10,
		LEFT	= 100,
		RIGHT	= 1000
	} direction_t;

This is similar in nature to #define'ing these values, but setting them up as part of an enum gives the compiler another chance to detect any misuse. For example, trying to evaluate UP + UP makes perfect sense to the compiler if UP is merely a number, but if UP is an enum'd value, some compilers will complain that this doesn't make much sense. (Unfortunately, the gcc compiler isn't one of them- but at least gdb knows about them.)

Please bring any errors or inconsistencies in this document to the attention of the author.

Dan Ellard <ellard@deas.harvard.edu>