ECES 338 Bill's Recitation Notes Recitation #1, 01/17/01 NOTICE: I am not a UNIX god. I know what I'm doing well enough for the things which I want to do and the things which you'll need to do in this class, but it's very likely that some of you out there know much more about this side of UNIX than I do. If you believe that anything should be added or corrected here for the benefit of your classmates, feel free to inform me so that I may inform them. I. Obtaining an Account Log into the machine cerne.cwru.edu with the name "newuser". It will ask you some questions, answer them, wait however long it suggests waiting, then verify that it has been activated. Note that telnetting to cerne is the easier way to log into it (I assume it's sitting in the UNIX lab still, but I have not verified this). II. Logging On Your username will be assigned (should be the same as your CWRUnet ID), and you will have picked a password when obtaining your account. To log onto a machine, use that username and password. You can log on by sitting locally at the machine or remotely by using a telnet/ssh program. I've got an incomplete list of machines posted (it was the hosts file from the machine chandra from last year, I checked the hosts file this year and it was useless for us. I'm assuming that was due to IP routing and the way that admin chose to deal with it). You can check http://www.eecs.cwru.edu/jcc/accounts.shtml , there should be a table of machines part way down the page. I am not sure how up to date this page is kept, but there is a table of machines on it. III. Directory Structure & Navigation UNIX uses a hierarchical file structure, just like every other OS that the majority of you have seen (I can't think of any that don't right now). What this means is that it contains files and directories, and that directories can be placed inside directories creating a hierarchy. The only big difference between the UNIX file system and others which you might be familiar with (like DOS's) is extensive use of links, and the ability for hard links. Directory links can allow directories to go in circles, so where you would always reach the end of a DOS directory, it is possible for a UNIX directory branch to go on indefinitely. Symbolic links are similar to Windows shortcuts or Macintosh aliases, but much more robust. They are files which point to another file, but unlike at least Windows shortcuts (not sure about Mac aliases), they can be used completely transparently everywhere no matter what they are being used for (I know that in some places Win shortcuts just don't cut it). What I've heard about Win2000 suggests that they're bringing soft links at least back into use, but you have to use NTFS for them. Hard links are a way to use the same actual physical file and just enter it into the directory structure at another place. The command for creating links is "ln". One final note about links is that you cannot explicitly make hard links of a directory. To view the current directory, the command "ls" is used, the "-a" and "-l" options are two very useful options. Hidden files are shown with "-a" and file permissions and other information are shown with "-l". Files are hidden in UNIX by prefixing the name with a period. To change directories, the command "cd" is used. UNIX "cd" works, at least with a superficial level, the same way that DOS "cd" does, except that forward slash ("/") instead of back slash ("\") separates directory names. Note that the directories "." and ".." (current and parent directories respectively) exist in UNIX, and that "." is much more useful, in that it is not uncommon for a person in a UNIX environment to find that the current directory is not part of their search path and that they can't run the program which they just navigated to. When this happens if you don't feel like screwing with the path at the moment, you can solve the problem by prefixing your program name with "./". Some UNIX users are in the habit of always putting "./" at the beginning every time that they use a program in the current directory, even if "." is in their path. File permissions will be a new thing for some people. UNIX, unlike Win9x and MacOS releases prior to OS X (which runs on top of a BSD flavor UNIX), was created to be a multi-user system. When you've got a multi-user system, some of the users may not want others of the users to access their files. To address this, UNIX has file permissions. If you look at a directory listing of the root directory you will see a line for the "bin" directory, which will contain a date time, but will also have a section similar to the following: "drwxr-xr-x" and "root root". This is the file permission information. The leading letter, a 'd' in this case is information about the type of file. The most common type that you will see is '-', or a "normal" file, followed by 'd', a directory, next would be 'l', a symbolic link, and the last one which you might need to deal with this class would be 'p', a pipe. The next nine digits of that string are the permissions, 'r' for read, 'w' for write, and 'x' for execute (or traverse for directories). The nine digits are three groups of three of them, the left-most group is the permissions for the owner, the middle group is the permissions for members of the owning group, and the right-most group contains the permissions for everyone else. A '-' in that string means that permission is denied. So what that permission string above means is that only root would be able to do something like rename the "bin" directory, but that anyone may read it and traverse into it. A private text file might have the permissions "-rw-------", an executable script or binary for general use might have the permissions "-rwxr-xr-x". It is possible for a couple of other letters to show up in that string, but you shouldn't have to worry about it from this class. The way to change file permissions is with the command "chmod". The command to change owners is "chown" and groups is "chgrp", though these two might be exactly the same binary in some implementations of UNIX. Permission arguments to chmod can be made in two ways. The easier to read way is by using a string such as "a+x" which adds the 'x' permission to owner, group, and others all three regardless of what they were, and leaves all other permissions alone. The other common way is to include a three digit octal string stating the permissions ("600" = "-rw-------", and "755" = "-rwxr-xr-x"). For a full list of supported options, check the online help (see below for more on online help). The other part, about the two "root"'s is the owner and owning group. "root" is the administrator user, also called the super-user, in UNIX. The super-user has access to everything, regardless of file permissions and is the owner of vital system files and lots of other things that you'd like to play with. There is also a group named "root" and this group is commonly used as the group name for those files I just mentioned. IV. Files In UNIX, everything's a file. EVERYTHING. (note that it does not matter if it is actually a file or not, UNIX will treat it like one). V. Utilities The major type of utility program that you'll use in this class is a text editor. There are four common basic text editors: ed, vi, pico, and emacs. None of you want to ever have to use ed. ed is a line editor, and I don't believe that anyone would describe it as user-friendly unless they'd been editing their documents in a straight binary format. vi is better, it is visual, and allows you to actually see the contents of as much of your document as will fit on the screen. Many people hate vi, but there are also vi junkies out there (I know someone who found a version to use for their text editor in Windows). I think that vi has some nice features making it useful, and has the advantage that while I've seen systems which don't have pico or emacs, I've yet to see a system that doesn't have vi (note that I rarely care to check for ed). vi does have a strong disadvantage for many people because it has a separate command and editing mode, which can create confusion, and does not have any particularly helpful reminders as to the commands, though it does have online help. I recommend that everyone be able to use vi, then switch to pico or emacs if you'd like (if you're doing your work on the UNIX lab machines, then I recommend AGAINST using the version of vi there. As of Spring 00 it was pretty bad, but newer versions of vi, or vim, which come with most distributions of Linux can be very nice if you take the time to customize them). pico has the benefit of being even more visual, it doesn't have the separate command mode, and it has a list of possible commands at the bottom of the screen for you. Emacs is considered by many people to be the best of all these editors. Note about these text editors: If you haven't guessed, I use vi and pico (in fact this document is being typed in vi). I have no desire to ever touch ed, and I haven't decided to teach myself emacs yet. There is a separate document for some basic commands for vi to get you started in there, pico is self explanatory, and emacs I don't know yet. The most important thing for each of these programs though is to know how to quit. In ed, the command "w" will save your file and "q" will quit. In vi, it uses the ed commands prefixed by a colon, so you can save and quit with ":wq", you can save to the file to a different name with ":w Bob" (where Bob is the new name), or you can quit without saving by using the command ":q". Note that if you haven't saved your document and try to quit or try to save a document over a read-only file, you have to put a "!" at the end of your command to quit, so usually when someone wants to quit vi without saving, they do ":q!". In pico you can save with Ctrl-o, and you can exit with Ctrl-x. In emacs, as far as I have ascertained, you can access the menu with the F10 key and in the files menu you can find options both to save and to exit. I'll look and see if one of the other recitation leaders wrote an emacs guide and if so will tell you who and link to it from the web page. If you wonder why I'd use vi, it's because it's very convenient for finding specific lines in my code for compiler errors, automatic indent, and ease of doing certain large operations such as erasing 100 lines. One nice thing about pico that I noticed on my own machine is that it automatically cleans up files outputted from script rather nicely and also there's a decent version of it on the UNIX lab machines. The only other UNIX utilities that I'm going to mention (other than programming specific ones) are more and grep. Both of these utilities may be used on a file, such as "more hookers.txt" to output a text file which is appearantly about hookers to the screen, stopping so that you can read it all. Many people use a utility called less instead, which allows you to arrow up and down with the output, I don't think it's a standard utility, but comes with most distributions of Linux as far as I know. You use grep to search for a given string or pattern. These utilities are used more often with piped output, such as "netstat | grep ftp | more" to check how many connections and who is connected to your ftp server and to pause if the information is more than will fit on one screen so that you can read it (more information on pipes below, and you'll also learn more later in the semester). VI. Programming The UNIX development environment which the majority of you will probably be using is the old-fashioned, command line one. This means that you'll use some text editor to type your code, then will use a command line compiler to compile it, and a command line debugger to debug it if you need it. Text editors were covered above. The compiler that most all of you will be using is gcc. You can compile HW1.c with the command "gcc HW1.c", this will compile the file "HW1.c" and name the executable "a.out". There are a LOT of options you can use with gcc, but there are a couple which are fairly useful that I'll point out. The option "-g" can be used to include debugging information useful if you're going to be debugging the program with a separate debugger as opposed to outputting the information to the screen or a file. The option "-c" can be used to create an object file instead of an executable, useful for if you wish to separate your code into multiple files, which shouldn't be necessary for this class for most assignments, but might come in handy. Default output with this option is a file with the same name but a ".o" extension. The option "-o " is used to change the name of the output file. The option "-l" is used to tell the linker to include libraries which it doesn't normally search, this option will be necessary for certain assignments, such as sockets. An example to use all of these would be: Suppose you're writing a socket program, you put it in the file sockhw.c and that you use some code that you already wrote in util.c and no socket calls are made in util.c . One option for compiling this would be to do it all at once every time and assume that you don't need to use a debugger with the command "gcc util.c sockhw.c -lsocket -o sockhw.out" to compile it, link it, and place it in the file sockhw.out. Now lets say that sockhw.c has a lot of bugs and you're getting irritable with compile time because everyone else in the free world with an account is using the same machine and you misplaced the list of other machines. You might compile it in two stages, doing "gcc -g -c util.c" first, then you'd have a util.o which wouldn't need to be recompiled each time and would be ready for debugging too. Assume that you want debugging support in your final one too and you could then use "gcc -g util.o sockhw.c -lsocket -o sockhw.out" to give you the final executable. Note that if you would just use "gcc sockhw.c" then it would compile it for you, so you could use that to find your errors, but then would give you linker errors after the compile completed successfully. The other tool which you might or might not use is the debugger. The debugger most of you will be using will be gdb (or maybe xxgdb, but it's basically the same). gdb is a command line debugger. It's a bit of a pain, but there are situations where it can be very useful. You can debug a program from the beginning or attach to a running process. I've written a sample run in a narrative style of using gdb by running it on a program from the beginning. Say you've got a program named "Prog1.out" which you want to debug. You would debug it from the beginning with the command "gdb Prog1.out". If it was already running, and didn't seem to be doing anything, and it's Process ID was 1153, then you would attach to it with "gdb Prog1.out 1153". If you ran it and it dumped a core file and you'd like to see where it went wrong, you can do that too with "gdb Prog1.out core". For some ideas on how to use gdb once you're in it, consult the sample run narrative. Now for those of us more used to a nice visual debugging environment, this seems rather painful, and it is, but sometimes it's less painful than the alternatives. VII. Further/Misc Info One noted difference between UNIX and DOS commands is that DOS commands commonly used '/' to prefix switches. UNIX uses '-'. Where in DOS you could often get a little blurb about program uses by typing " /?", in UNIX the common form for this is " --help". Another note about UNIX switches is that some programs support the use of multiple one letter switches in one big string. For example "ls -a -l" will do the same thing as "ls -al" on any machine I've seen. Note that this is not the case for all programs, the compiler, gcc, does not support this. On the topic of online help, for more detailed documentation than with the --help flag, try the manual pages. The manual pages for a command with the command "man ". Note that you can use --help and man on the man command itself. This can be used to find the syntax on the current platform for things such as narrowing your search for a command to a specific section of the manual pages (useful for if you'd like to see the manual page for the C command "write" in section 2 of the man pages instead of the UNIX console command "write" in section 1 which you will get if you just use "man write" [varies in how to do this, it's done with "man -s 2 write" on the UNIX lab machines and "man 2 write" on my Linux box, so it does vary]). Note that you can also use the man command to search using keywords, usually with the -k switch, though I've found that to be not always as useful as I'd hoped. Piping input and output is something that I'd consider a must. The way in which you might be most familiar with piping output is commonly used in both DOS and UNIX. The first one is something like "ls | more" (as mentioned above briefly). This takes the standard output of the first program, "ls" in this case, and uses it for the standard input of the second program, "more". This can be tremendously useful for taking commands with very irritating, excessive, or confusing output, and passing them through other programs to tidy them up for your browsing. Pipes are used not just for tidying info, but also sometimes in other applications as well. The utility tar is used to archive multiple files into one archive file, and the utility gzip/gunzip is used to compress a file. Files are frequently tarred into an archive and then zipped. While both of these programs support automatically doing the function of the other, some people, as opposed to using those options will first decompress the file and then pipe the output to the tar program to expand the archive. Note that you may be thinking that all this should be possible by using intermediate files instead of pipes, and for all the cases that I can think of off the top of my head right now, that's right, but you might not want to waste the extra disk space for it (especially for some .tar.gz's). Some commands do not output to text files by default, such as ls. One way to do this is with another common operator, >. So a person might list all the files, with their permissions in a directory and save it to the file "list.txt" by using the command "ls -al > list.txt". These two operators work on standard output, but what if you want to redirect standard error? That can be done too. The operators to redirect standard error are ">&" (equivalent to "2>") and "|&". What if you want to append the output to the end of a file? Use ">>" (note that you can append standard error with "2>>"). What if you want to capture both? Do "MyProg 1> MyOutput 2> MyErrors". (Note that "1>" is equivalent to ">". These integers are used because 0 references stdin, 1 references stdout, and 2 references stderr. You'll see this in your assignments). Note that I've only used redirection of individual output streams one at a time, when I want to automatically capture ALL output streams which go to the console (a program could make more than just stdout and stderr if it really wanted to, not that I can see a commonly compelling reason to do that), I turn to the UNIX utility script. To produce output for your programs, say it's HW1.out, you might use the command "script HW1_output.txt" then you'd go ahead and do whatever trial runs you wish, then type Ctrl-D to exit and save all that to the file "HW1_output.txt". This file does need to be cleaned up before turning it in though, it usually has extra control characters in places (such as returns and backspaces). There are ways to remove these quickly, but otherwise do it manually (I've found that opening the file and saving it in pico will remove the Ctrl-M's from the returns, but it does nothing for the backspaces). The other nice thing about using script is that you can see what the prompts for input say (if you redirect output to a file then you won't see them on the console), and script also captures what you type into be entered on stdin. UNIX, being an OS designed for multiple people at the same time, needs to be, and is, capable of handling multiple processes at once, known as multitasking. It would be wasteful and irritating to build this capability into the OS and not allow a single user to access it from a single command prompt/shell, so there's a way. The first way, if you know you want to run a command in the background is to put a "&" at the end. For example, if I'm in X and I want to run Netscape, but I don't want to just leave a shell open for it, I might use a shell window and type "netscape &". Or if I'm in Linux and I want to manually update my database used for locate but want to go on about my business otherwise, I might run "/etc/cron.daily/slocate.cron &". Suppose I'm using vi to write something, and I need to check something in another shell, and for some reason I can't or don't want to open another shell, I can type Ctrl-Z, to suspend the process and put me back to the shell. I can then check what I want and return to it with the "fg" command. Note that the "fg" command can also be used with the "&" method. "bg" is another related command for running processes in the background. You can get away not using these much, but it might be useful to look into them. If you need to run a command that will take hours and you cannot just sit and wait for it to run (say it will take 15 minutes but you have to shut your machine down in 2 minutes for some reason, normally that would cause the program to terminate). You can get around this by running the command inside parentesis. So, saying that you want to run the program LongInvolvedComputation in such a way that you can log out of the machine and have it keep running, you would type "(LongInvolvedComputation) &" and then logout at will. In this class, you'll be using C, not C++. The differences can be subtle, but also somewhat confusing at first. There're a lot of things that you might need to learn, but here are some basics. C does not have classes. Note that cin, cout, and cerr are all class constructs, so you cannot use these in C. Getting input will be done with scanf or fscanf. Output will be done with printf or fprintf. Since there are no classes, there are not string classes, you'll have to use character pointers. String manipulation can be irritating, the use of the C string functions strcat, strcmp, and strcpy will ease this, as will sprintf, which works like printf or fprintf but outputs to a string instead of the screen or a file (there's also an sscanf if you want it, I usually prefer just to parse the string myself). File operations are done with FILE pointers, the fopen command, fscanf, fprintf, and close. The keywords new and delete are not part of C, malloc and free are commonly used in their places. I'm not trying to explain all these commands here as that would be bulky and very easily incomplete. I suggest that you look all these up on your own when you need them. You should be able to find all these in the man pages, but I'd really recommend a pocket C/C++ reference which is separated into sections such that C++ only features are in their own sections. I've got one and it's proven invaluable to me (it was only $16.99, cheap for a computer book). The last thing that I noticed is that it wants you to declare all variables at the top of the function. This isn't so much of a problem as an irritation when you forget. After this class you might find that you like some of these commands better than the C++ ways you've been using, remember that C++ encompasses C, so you can use all the functions I mentioned above in C++ code. Next is shells. In DOS, there's one shell. It's what people call the C prompt, it's location is the command.com file (or cmd.exe in Win NT). There are more than one shell commonly used in UNIX. Two of the most common shells are the C shell and BASH. You don't really need to know a lot about these shells, just that if you see that things work a little differently for someone else, it's not any big problem, just a different shell. Each shell has different features, and some are nicer than others for different things. Note that these shells have initialization files in your home directory which may set environment variables and aliases. The init file for the C shell is named ".cshrc" and one of the ones for BASH is ".bashrc" . An example of an alias for you is let's say that you want to be able to type "adios" to logout. The command to logout is, oddly enough, "logout". To set this in your .cshrc file you'd include the line: alias adios 'logout' (note that's the forward quotation mark, not the back quote). For BASH it'd be: alias adios='logout' There are other commands in an init file, but these are just examples of one option in a couple of different shells. If you want to know what shell you're currently using, check your SHELL environment variable (see below for environment variables). Environment variables can play an important role in an OS. The most familiar environment variable to most of us is PATH. This variable is used in both MS operating systems and in UNIX to specify what directories to look in for executable programs. In MS OS's, the current directory is always part of the path, but in UNIX, a person may often find that it is not. The way to check all your environment variables in DOS is by typing "set", the way to do this in UNIX is the "env" command. Sometimes all the environment variables can be a long list, especially in an xterm session, and perhaps you want to check just one particular environment variable. There are two ways to do this. Let's say that you want to see what your PATH variable is. The first way is to stick with env and use grep, you could do "env | grep PATH". Note that this will return any other lines which contain the text "PATH" (such as perhaps "CLASSPATH" for Java) as well. This might not be acceptable. The way to query for just one environment variable is through the echo command. As in DOS you could refer to an environment variable by surrounding it with % (%PATH% might look familiar), you do this in UNIX by using $ as a prefix ($PATH), so if you use the command "echo $PATH" you will get just the PATH environment variable returned to you. If you want to change an environment variable in your current shell, this will be dependent on the shell you're using. In BASH, just type it and then = the new value (for example "PATH=.:$PATH" would add the current directory to the beginning of your path). In the C shell you'd have to use the set command ("set PATH=.:$PATH" would be the equivalent). Note that just adding this to the init file might not work, it seems to work to add the C shell one to .cshrc, but for the BASH one to work you have to include "export PATH" on a line below the change. VIII. Essential Commands NOTE: This is by no means all essential commands for using UNIX, these are just the ones which you might want to use related to this class that I could think of. There are other commands, such as "su" which are very useful, but that you shouldn't need for this class. This list was long enough without. ls - used to obtain a directory listing cd - used to change directories mkdir - used to create a new directory pwd - used to return the current directory chmod - used to change file permissions chown/chgrp - used to change owner/owning group ln - used to create links more/less - used to pause display to allow user to read it and displaying files cat - used for formatting output and displaying files man - program for browsing electronic manual pages script - used to capture both input and output to a text file env - used to return all environment variables grep - used to search for a given expression lpr - used to print cp - used to copy a file mv - used to move a file (used for renaming files as well) rm - used to delete a file ps - used to display running processes kill - used to terminate or send signals to processes mount/umount - used to mount and unmount file systems (includes floppy & CD's) pine - one of many mail programs (you might get mail on the UNIX machines) passwd/yppasswd - used to change login password (yp to change for whole yp sys) chsh/ypchsh - used to change default shell (yp to change for whole yp sys) touch - accesses or creates a file telnet/ssh - used to access UNIX machines ftp - used for command prompt ftp client who - shows users logged into machine whoami - show current user name startx/openwin - start the UNIX windows system (usually start in it though) exit - exit/close current shell logout - log out of current machine nslookup - used to do a name server lookup for hostname/IP address conversion ping - check for network connectivity over IP to a given machine finger - get information about user