A Brief UNIX Tutorial

This document is based on a document originally written for CS50 by Professor Margo Seltzer and members of the CS50 course staff. -DJE


What is UNIX?

The computers that we will be using for S-Q are DEC alphas running OSF/1, a variant of the UNIX operating system. An operating system is the program that manages the resources of a computer, mediates among multiple users, and insulates you (the user) from the details of the hardware and from the other users.

If you have used a PC, then you might be familiar with DOS, or its more recent relatives, Windows 3.1, '95 and NT. If you have used a Macintosh, then you might be familiar with the Mac Operating System. UNIX systems are different from DOS, Windows and Mac OS in that they are designed to handle many simultaneous users (called multi-user systems). Additionally, UNIX systems can perform many tasks at once (called multi-tasking).

Finally, UNIX was designed primarily as a tool for programmers. Rather than offering a complete solution to a problem (e.g., a single application that lets you edit text, format it, and print) UNIX provides you with a large assortment of tools that can be mixed and matched to accomplish a wide variety of tasks. You might find this approach confusing and frustrating at first, but bear with us; once you know your way around the system, you will find it a very productive environment.

The book ``Harley Hahn's Student Guide to UNIX (Second Edition)'' (Hahn) is a good and in-depth introduction to UNIX. If you are not familiar with UNIX, I suggest that you buy a copy of this book and skim through it as need be (and even if you are experienced, this book is a valuable reference).

The book ``Learning the UNIX Operating System'' (Todino & Strang) provides a relatively short and simplistic introduction to UNIX. This book discusses a variant of UNIX called System V, which is similar to HPUX.

To find out more detailed information about any command on the system, use the on-line manual pages described in a later section.


Logging In

The UNIX machines that we'll use for this course are named ice (for Instructional Computing Environment). There are actually several identical machines named ice1 through ice4; when you log in to ice, you will be redirected to one of these. The machines should behave completely identically, so there is no need to worry about which you are logged in to.

The ice machines are not physically accessible to users (i.e. they are not located in a public place). A number of alpha workstations are available in the basement of the Science Center, however. These machines are named ws1 through ws38 and are configured to be functionally identical to the ice machines. If you log into one of these workstations, there is no need to telnet to one of the ice machines.

You should be able to telnet to the ice machines from anywhere on the internet-- simply telnet to ice.harvard.edu. (You can also connect via ssh, if your computer supports it, and I recommend this because of the added security and privacy it provides.)

Note that the fas computers are intended for general use (mostly email) and should not be used for doing your programming assignments. The fas computers are not configured identically to the ice computers.

In order to use the Harvard UNIX systems at all, you will need an account. You should have been given an account when you registered for Summer School, and this account will suffice for all of your coursework. If you did not receive an account, please contact the User Services people in the basement of the Science Center ASAP about getting one (or contact me, if you have difficulty obtaining an account).

When you go to log into the alpha workstations directly from the console, there will be a window displayed where you enter your logname and password. When you log in, the window manager will start up automatically. You can click on the small terminal icon to create new windows. Clicking on the background will give you a menu that will allow you to log out.


Getting Help on the System

UNIX includes a system of on-line manual pages (man pages) that document nearly all of the commands, so it's not necessary to remember how to use all the commands or to lug around a thick manual in order to use UNIX effectively. To find out information about a particular command, try typing:

man topic
where man is the command you are executing and topic is the topic about which you are seeking information.

You can also find out information about things even if you have forgotten their exact names. To get a list of all the commands or manual pages related to a certain topic, type:

man -k topic
Sometimes you will see man pages referred to as name(1). The number following the name of the man page is the section in which you will find the command. Normally, you will not need to worry about the sections, but in cases where the same topic appears in multiple sections, you may need to specify the section. In this case, you can type:

man n topic
where n is the section of the manual you need.

In general, you can use the man pages to answer questions and help you learn about the system. They should be your first source of information when you have questions.


The Shell

When you type commands on a UNIX system, you are not typing directly to UNIX. Instead, you are typing commands to a program called a shell. A shell is a program that reads each line of text that you type, and tries to execute it as a command. The shell is responsible for finding the programs that you want to run and managing all the different programs that you are using. There are quite a number of shells available for UNIX systems. The one that is most commonly used on the course systems (and the one that we strongly recommend that you use for this course) is called tcsh.

For now, you can ignore shell features and just use it to type commands. As you become more comfortable with the system and want to do more complicated things, you can learn more about the shell, especially how to write programs in it. To find out about some of the power of the shell, look at the man pages.


Editing Text

When you are writing software, the tool you will probably spend more time actively using than any other is a text editor. Therefore, we strongly encourage you to invest a little extra time at the beginning of the semester to become familiar with one of the text editors available on the UNIX machines.

There are a number of good programming editors available for UNIX, but by far the most popular editors are vi and emacs (or some of their many descendants). Each of these editors has advantages over the other, and there are endless and pointless debates about which is better. For the purposes of this course, however, we recommend that unless you are already familiar with emacs (and don't want to take the opportunity to learn a new editor) that you learn vi.

vi may seem a bit confusing at first, especially if you are used to graphically-oriented text processors such as Word Perfect or Microsoft Word, but given a little time, most users find that it is easy to learn how to do some very powerful things with vi.

Emacs is arguably more powerful than vi, and is vastly more extensible, but many people find it harder to learn, and therefore never learn enough about it to make use of the advanced features it offers. People susceptible to RSI often find emacs commands hard to type, since many of the command commands require awkward use of the control or meta keys.

The default editor on the Harvard systems is pico. This editor was designed for email and to be extremely easy to learn. Is is sufficient for editing prose, but doesn't contain a lot of the features that are useful to programmers. I recommend that you invest the time to learn vi or emacs-- they contain features that will make your editing chores much faster and easier.

If you want to change the default editor that you use, add the following commands to the end of your ~/.cshrc file:

	setenv EDITOR prog
	setenv VISUAL prog
(where prog is the name of the editor you prefer).

Many powerful editors, such as vi and emacs, are not completely documented by their manual pages. Instead, they have their own documentation (including on-line help). Emacs and vi tutorials are available from User Services, and there is more information available online (via web pages, etc). The Harley Hahn book (mentioned earlier) includes tutorials for all of vi, emacs, and pico.


UNIX Tools

As mentioned at the beginning of this document, one of the things that makes UNIX so useful is its wide assortment of tools. You can also make the tools interact to form more powerful tools. In this section, we will present some of the most commonly used tools and also some common ways that you can "hook tools together". However, the best way for you to learn about the power of UNIX is to experiment with it. Read interesting man pages. Try to do things. Try to make the computer make your life easier!

For each command we give the command name in command font and a mnemonic (in italics) indicating why the command has such a bizarre name, and a description of the command. As always, you can get more information from the man page.

ls
(list) displays the names of the files in a directory.

grep
(global regular expression print) searches files for a pattern (called a regular expression).

cd
(change directory) changes your current working directory.

cat
(concatenate) displays the contents of a file.

more
(show more of a file). Displays the contents of a file one page at a time. Each time a page is displayed, you can execute a command that searches for text, continues paging, goes to the next file, etc. You will find this an extremely useful command. A newer version of more, named less (a terrible pun on more) is also available.

You can save the output of any command by using a mechanism in the shell called Input/Output Redirection. The symbols used for redirection are "<" and ">". To save the output of a command in a file instead of having it appear on your screen, use output redirection (">"). For example, to save the output of the grep command, you could type:

	grep Thurs ~lib50/cs50.times > thursday_sections 
This command would look for the string "Thurs" in the file "~lib50/cs50.times". It would then put the output of the command (what the command would normally print to the screen) in the file "thursday_sections."

On the other hand, you might have a program which expects the "user" to input a long list of words. You might get tired of typing the words in each time you run the program, so you could use input redirection ("<") to help you:

	myprog < /usr/dict/words
This command line would execute the program myprog and give it as input the data in the file "/usr/dict/words".

In addition to using the shell to redirect input and output to files, you can use a shell mechanism called a pipe to string commands together. For example, suppose that you wanted to look at all the five-letter words in the dictionary. You could type:

	grep '^.....$' /usr/dict/words
Unfortunately, there are many five-letter words and the list would quickly scroll off your screen. What you might really like is to be able to take this list and use the more program to display it one screenful at a time. While you could do this by creating a temporary file:

grep '^.....$' /usr/dict/words > tempfile
more tempfile
You do not necessarily want to create the temporary file. Worse yet, suppose you were constructing a file so large that it could not fit on your disk. The shell lets you string together the grep and more command using pipes.

	grep '^.....$' /usr/dict/words | more
The "|" symbol creates a pipeline between the grep command and the more command. The output from grep (your list of five-letter words) is used as the input to more. You can string together as many commands as you like:

	grep '^.....$' /usr/dict/words | sort | more

Debugging

The first version of your program rarely does exactly what you intended it to do, and sometimes it can do something completely unanticipated! When this happens, your program is said to contain a bug, and the process of removing these errors is called debugging. Luckily, UNIX provides tools to help debug programs.

GDB

gdb is a debugger. If you run your program from inside gdb, gdb will monitor the program as it runs and allow you to watch and control the execution. You can stop the program at any point and look at the value of variables, execute one instruction at a time, have it stop when certain conditions occur, etc. You will find that in early assignments you can find most of your bugs by running the program and looking at the code. Therefore you might be inclined to avoid learning to use the debugger. This is a huge mistake! As your programs become more complex, you will be unable to find bugs merely by looking at the code. It is far easier to learn to use gdb now when you can understand what your programs are doing, rather than waiting until your programs are very complicated.

Core Dumps

If your program has a particularly nasty bug in it, UNIX may detect this while the program is running and automatically stop your program before it can harm the rest of the system (unlike the Mac or PCs, which usually just crash). When this happens, UNIX creates a file named core in your current directory. The core file contains as much information about your program as UNIX can determine. Unfortunately, this information is not kept in a human-readable form; you will need to use a debugger like gdb in order to decipher its contents.

The core file can be used by gdb in order to help perform an autopsy on the program. You can use gdb as you normally would, only you cannot execute the program from the core; you can just look at the values of variables to determine where it crashed, why it crashed, and ideally, how to make it stop crashing.


S-Q Course Staff