This document was originally written for CS50 by Professor Margo Seltzer and members of the CS50 course staff. -DJE

A Brief UNIX Tutorial

The computers that we will be using for S-Q run HPUX, a variant of the UNIX operating system. An operating system is the program that manages the resources of a computer, mediates among multiple users, and insulates you (the user) from the details of the hardware and from the other users.

If you have used a PC, then you might be familiar with DOS, or its more recent relatives, Windows 3.1, '95 and NT. If you have used a Macintosh, then you might be familiar with the Mac Operating System. UNIX systems are different from DOS, Windows and Mac OS in that they are designed to handle many simultaneous users (called multi-user systems). Additionally, UNIX systems can perform many tasks at once (called multi-tasking).

Finally, UNIX was designed primarily as a tool for programmers. Rather than offering a complete solution to a problem (e.g., a single application that lets you edit text, format it, and print) UNIX provides you with a large assortment of tools that can be mixed and matched to accomplish a wide variety of tasks. You might find this approach confusing and frustrating at first, but bear with us; once you know your way around the system, you will find it a very productive environment.

The book "Harley Hahn's Student Guide to UNIX (Second Edition)" (Hahn) is a good and in-depth introduction to UNIX. If you are not familiar with UNIX, I suggest that you buy a copy of this book and skim through it as need be (and even if you are experienced, this book can be a valuable reference).

The book "Learning the UNIX Operating System" (Todino & Strang) provides a relatively short and simplistic introduction to UNIX. This book discusses a variant of UNIX called System V, which is similar to HPUX.

To find out more detailed information about any command on the system, use the on-line manual pages described in a later section.

Logging In

The HP machines that we'll use for this course are named course1, course2, course3, and course4. These machines are not physically accessible to users (i.e. they are not located in a public place). Instead, you must telnet to these machines from another machine. (There are a few HP workstations located in the basement of the Science Center, and these machines are equivalent to the course machines-- if you log in to one of them, there is no need to telnet to one of the course servers.) From any FAS system (including the many Macs and PCs in the basement of the Science Center), you can telnet directly to one of these computers by using the names course1 through course4. Outside of the FAS system (for example, if you telnet in to Harvard from offsite) you will need to specify the entire name of the system, by adding ".harvard.edu" to the end (i.e. course2.harvard.edu).

In order to use the Harvard UNIX systems at all, you will need an account. You should have been given an account when you registered for Summer School, and this account will suffice for all of your coursework. If you did not receive an account, please contact the User Services people in the basement of the Science Center ASAP about getting one (or contact me, if you have difficulty obtaining an account).

When you go to log into the HP workstations from the console, there will be a window displayed where you enter your logname and password. When you log in, the window manager will start up automatically. You can click on the small terminal icon to create new windows. Clicking on the background will give you a menu that will allow you to log out.

Getting Help on the System

UNIX includes a system of on-line manual pages (man pages) that document nearly all of the commands, so it's not necessary to remember how to use all the commands or to lug around a thick manual in order to use UNIX effectively. To find out information about a particular command, try typing:

man topic
where man is the command you are executing and topic is the topic about which you are seeking information.

You can also find out information about things even if you have forgotten their exact names. To get a list of all the commands or manual pages related to a certain topic, type:

man -k topic
Sometimes you will see man pages referred to as name(1). The number following the name of the man page is the section in which you will find the command. Normally, you will not need to worry about the sections, but in cases where the same topic appears in multiple sections, you may need to specify the section. In this case, you can type:

man n topic
where n is the section of the manual you need.

In general, you can use the man pages to answer questions and help you learn about the system. They should be your first source of information when you have questions.

The Shell

When you type commands on a UNIX system, you are not typing directly to UNIX. Instead, you are typing commands to a program called a shell. A shell is a program that reads each line of text that you type, and tries to execute it as a command. The shell is responsible for finding the programs that you want to run and managing all the different programs that you are using. There are quite a number of shells available for UNIX systems. The one that is most commonly used on the course systems (and the one that we strongly recommend that you use for this course) is called tcsh.

For now, you can ignore shell features and just use it to type commands. As you become more comfortable with the system and want to do more complicated things, you can learn more about the shell, especially how to write programs in it. To find out about some of the power of the shell, look at the man pages.

Editing Text

When you are writing software, the tool you will probably spend more time actively using than any other is a text editor. Therefore, we strongly encourage you to invest a little extra time at the beginning of the semester to become familiar with one of the text editors available on the UNIX machines.

There are a number of good programming editors available for UNIX, but by far the most popular editors are vi and emacs (or some of their many descendants). Each of these editors has advantages over the other, and there are endless and pointless debates about which is better. For the purposes of this course, however, we recommend that unless you are already familiar with emacs (and don't want to take the opportunity to learn a new editor) that you learn vi.

vi may seem a bit confusing at first, especially if you are used to graphically-oriented text processors such as Word Perfect or Microsoft Word, but given a little time, most users find that it is easy to learn how to do some very powerful things with vi.

Emacs is arguably more powerful than vi, and is vastly more extensible, but many people find it harder to learn, and therefore never learn enough about it to make use of the advanced features it offers. People susceptible to RSI often find emacs commands hard to type, since many of the command commands require awkward use of the control or meta keys.

The default editor on the Harvard systems is pico. This editor was designed for email and to be extremely easy to learn. Is is sufficient for editing prose, but doesn't contain a lot of the features that are useful to programmers. I recommend that you invest the time to learn vi or emacs-- they contain features that will make your editing chores much faster and easier.

If you want to change the default editor that you use, add the following commands to the end of your ~/.cshrc file:

	setenv EDITOR prog
	setenv VISUAL prog
(where prog is the name of the editor you prefer).

Many powerful editors, such as vi and emacs, are not completely documented by their manual pages. Instead, they have their own documentation (including on-line help). Emacs and vi tutorials are available from User Services, and there is more information available online (via web pages, etc). The Harley Hahn book (mentioned earlier) includes tutorials for all of vi, emacs, and pico.

Keeping Track of Versions of your Files

Throughout this course, you will be creating files and modifying them. Occasionally, you will make changes to a file and then regret having done so. In order to make it simple to get back to previous versions of your files, we strongly encourage you to use a source code control system. In general, source code control systems allow you to maintain a history of the changes you make to a file, so that you can "undo" sets of changes or compare different versions.

The system that we will be using for this class is called RCS (Revision Control System). RCS will keep track of all old versions of a file while you are creating a new version. If you like the changes you have made, and you wish to make them permanent, you can use the checkin command to do so. Once you have done that, RCS believes that most recent copy of the file is the "real" one. If, instead, you decide that you do not like the changes you made, you can retrieve any previous version of the file from RCS using the checkout command. Also, if you are trying to figure out what has changed since an earlier version (because your program used to work and now it doesn't), you can use the rcsdiff command to show you what has changed.

You can find out all the details of RCS by looking at the man pages for rcs, ci, co, rcs diff, rlog, and others. In the meantime, we have created some simplified versions of these commands that are described here.

checkin filenames
checkin tells RCS that you have a new version of the specified files. RCS will preserve them for future reference. When you execute checkin, RCS starts up an editor so that you can add comments about why you are checking in each file and any other information you want to record about this revision of the files. You will find it most helpful if you write very precise and detailed comments when you check things in, so that later when you need to find a specific version of a file, it is easy to identify.

The editor that checkin uses by default is vi. If you prefer to use a different editor, then make sure that your environment variable EDITOR is set to the name of the editor that you prefer. For example, if you prefer emacs, add the line:

		setenv EDITOR emacs 
	
to your ~/.cshrc file (if you haven't done so already).

checkout filenames
checkout retrieves the that last copy of files that you checked in. checkout can also be used to retrieve previous versions of files, by specifying the name or revision number that you want to check out.

rlog filename
rlog displays a list of all the revisions of a set of files. This will print out all the comments that you added when you ran checkin.

rcsdiff filenames
rcsdiff displays the differences between the version of each file that you have checked out and the last version that you checked in. If you want to look at different versions than the most recently checked in one, consult the man page for instructions.

Turning in your work

We will ask you to submit both paper and on-line copies of your homework. The paper copies make it easier for your Teaching Fellow to grade, and the on-line copies let us run your programs. The program that lets you submit your homework electronically is called submit. The submit program sends a copy of your homework to the course staff so that they can grade it. You can submit as often as you like, but only the last two submissions are kept.

To run submit:

  1. Make sure you have checked in all the files we specified, including any text files that you created. Remove any extraneous files that you don't want to hand in (usually a make clean will accomplish this).

  2. Go to the directory directly above the directory where your assignment files are. For example, if all your S-Q materials are in ~/Q/ and you are submitting assignment 1, which you have stored in ~/Q/asst1, then you should cd to ~/Q.

  3. Type the command submit. You will be asked for the name of the course and the assignment name. The name of the course is cscisq, and the name of the assignment is asstX, where X is the number of the assignment. Finally, you will be asked for the list of files to submit. The name of the file to submit is simply the name of the directory that your files are in. Specifying the name of a directory will cause all of the files in that directory to be submitted.

  4. submit will print out a bunch of messages on the screen. Most of them are pointless, but keep an eye out for specific errors or instructions. For example, when submitting asst1, you may see messages like:

    	asst1: File exists
    	tar: asst1/RCS/ couldn't create directory 
    	asst1: File exists
    	tar: asst1/RCS/LOG - cannot create
    	asst1: File exists
    	tar: asst1/RCS/array.c,v - cannot create
    	asst1: File exists
    	x asst1/array.c, 16 bytes, 1 tape blocks
    	
    These messages can be safely ignored.

UNIX Tools

As mentioned at the beginning of this document, one of the things that makes UNIX so useful is its wide assortment of tools. You can also make the tools interact to form more powerful tools. In this section, we will present some of the most commonly used tools and also some common ways that you can "hook tools together". However, the best way for you to learn about the power of UNIX is to experiment with it. Read interesting man pages. Try to do things. Try to make the computer make your life easier!

For each command we give the command name in command font and a mnemonic (in italics) indicating why the command has such a bizarre name, and a description of the command. As always, you can get more information from the man page.

ls
(list) displays the names of the files in a directory.

grep
(global regular expression print) searches files for a pattern (called a regular expression).

cd
(change directory) changes your current working directory.

cat
(concatenate) displays the contents of a file.

more
(show more of a file). Displays the contents of a file one page at a time. Each time a page is displayed, you can execute a command that searches for text, continues paging, goes to the next file, etc. You will find this an extremely useful command. A newer version of more, named less (a terrible pun on more) is also available.

You can save the output of any command by using a mechanism in the shell called Input/Output Redirection. The symbols used for redirection are "<" and ">". To save the output of a command in a file instead of having it appear on your screen, use output redirection (">"). For example, to save the output of the grep command, you could type:

	grep Thurs ~lib50/cs50.times > thursday_sections 
This command would look for the string "Thurs" in the file "~lib50/cs50.times". It would then put the output of the command (what the command would normally print to the screen) in the file "thursday_sections."

On the other hand, you might have a program which expects the "user" to input a long list of words. You might get tired of typing the words in each time you run the program, so you could use input redirection ("<") to help you:

	myprog < /usr/dict/words
This command line would execute the program myprog and give it as input the data in the file "/usr/dict/words".

In addition to using the shell to redirect input and output to files, you can use a shell mechanism called a pipe to string commands together. For example, suppose that you wanted to look at all the five-letter words in the dictionary. You could type:

	grep '^.....$' /usr/dict/words
Unfortunately, there are many five-letter words and the list would quickly scroll off your screen. What you might really like is to be able to take this list and use the more program to display it one screenful at a time. While you could do this by creating a temporary file:

grep '^.....$' /usr/dict/words > tempfile
more tempfile
You do not necessarily want to create the temporary file. Worse yet, suppose you were constructing a file so large that it could not fit on your disk. The shell lets you string together the grep and more command using pipes.

	grep '^.....$' /usr/dict/words | more
The "|" symbol creates a pipeline between the grep command and the more command. The output from grep (your list of five-letter words) is used as the input to more. You can string together as many commands as you like:

	grep '^.....$' /usr/dict/words | sort | more

Make

Large programs (or even the small programs you will write for this class) are usually made up of many different files. As your programs become larger, it will become more difficult to keep track of all the different pieces. Fortunately, UNIX provides a tool that does this for you. This tool, make, allows you to define rules for how to perform a job and then uses those rules to do the job for you. For example, if you have a large software program, it will usually consist of many source files (sometimes called ".c" files). Each of those source files might use some number of include files (sometimes called "header" or ".h" files). In order to create a program that the computer can run, you must compile each ".c" file to create ".o" files and then link the ".o" files together to create the final program. Furthermore, each time you modify one of those ".h" files, you will have to recompile all the ".c" files that use it. You can imagine that this gets tedious pretty quickly.

So you don't have to do each of these steps manually, the make utility can keep track of all your ".h" and ".c" files and generate the commands that must be executed to compile and link them. It uses a set of rules to figure out how to do this. The language in which to write these rules is cryptic and difficult to understand, even for experienced programmers. Fortunately, make already knows all of the rules that you need for this class. We will usually give you a makefile (set of rules for make to use) so all you will need to do is type

make 
Sometimes we will ask you to change the names of the files listed in the makefile. In any case, at this point in your programming career, you need not concern yourself with the details of make. However, you should have an understanding of what it is doing for you.

Debugging

The first version of your program rarely does exactly what you intended it to do, and sometimes it can do something completely unanticipated! When this happens, your program is said to contain a bug, and the process of removing these errors is called debugging. Luckily, UNIX provides tools to help debug programs.

GDB

gdb is a debugger. If you run your program from inside gdb, gdb will monitor the program as it runs and allow you to watch and control the execution. You can stop the program at any point and look at the value of variables, execute one instruction at a time, have it stop when certain conditions occur, etc. You will find that in early assignments you can find most of your bugs by running the program and looking at the code. Therefore you might be inclined to avoid learning to use the debugger. This is a huge mistake! As your programs become more complex, you will be unable to find bugs merely by looking at the code. It is far easier to learn to use gdb now when you can understand what your programs are doing, rather than waiting until your programs are very complicated.

Core Dumps

If your program has a particularly nasty bug in it, UNIX may detect this while the program is running and automatically stop your program before it can harm the rest of the system (unlike the Mac or PCs, which usually just crash). When this happens, UNIX creates a file named core in your current directory. The core file contains as much information about your program as UNIX can determine. Unfortunately, this information is not kept in a human-readable form; you will need to use a debugger like gdb in order to decipher its contents.

The core file can be used by gdb in order to help perform an autopsy on the program. You can use gdb as you normally would, only you cannot execute the program from the core; you can just look at the values of variables to determine where it crashed, why it crashed, and ideally, how to make it stop crashing.

Communicating Via Email

There are many tools designed to let users communicate via the computer. The tool that we will be using most heavily is email. The mail program lets you send and receive messages with users in this class, at Harvard, or nearly anywhere in the world. You will often find email (electronic mail) the easiest way to contact your teaching fellow or professor. Please learn how to use the mail system effectively; you will find it extremely useful.

Please check your email frequently (especially when you are working on your homework) in order to see whether we've sent out any changes to the assignment or added any hints.

There are a variety of different mail reading programs available on the science center machines. The most common is pine. pine is easy to learn, and has a number of useful features. User Services (in the basement of the Science Center) has documentation and tutorials for pine.

The most basic mail program is called mailx, and is described in Todino and Strang's "UNIX in a Nutshell". Refer to the man pages and the book for instructions on using mailx.

There are two mailing lists set up for this course:

s-q@eecs.harvard.edu
The s-q mailing list sends mail to the course staff and all of the students. It can be useful for setting up study groups or any other sort of announcement to the class as a whole. (Remember that you must work individually on the assignments, so posting questions or discussions of the assignments is inappropriate. Such mail should be sent to libsq, as described below.)

Note that for this mailing list, the entire email address needs to be specified; just using s-q is not sufficient.

libsq@fas.harvard.edu
The libsq mailing list sends mail to the S-Q course staff. It is useful for asking questions about the assignments or just generally getting in touch with the professor or a TF.


Dan Ellard