Introduction

This tutorial will help those new to computing in the geosciences become familiar with working in a command line environment. Here you will learn about the basics of UNIX file structures, how to navigate in a UNIX environment, and you’ll get to practice creating, storing and searching for files.

What is UNIX?

https://commons.wikimedia.org/wiki/File:Ken_Thompson_(sitting)_and_Dennis_Ritchie_at_PDP-11_(2876612463).jpg

caption: Dennis Ritchie (standing) and Ken Thompson working on a PDP-11 minicomputer

UNIX is an operating system that can handle multiple users and processes at the same time. Dennis Ritchie and Ken Thompson of AT&T’s Bell Labs developed Unix in the late 1960s and early 1970s. UNIX forms the basis for Linux, Apple’s Mac OSx, Android, and most other tech that you know and love, including all versions of Microsoft Windows. Expand the optional Geek Box below if you want an in-depth look at the many Unix variants over the years.

_PANEL

_Title Origins of UNIX

https://commons.wikimedia.org/wiki/File:Unix_history-simple.svg

Flowchart of Unix origins and subsequent variants over the years.

_END_PANEL

Work in a UNIX environment is accomplished through typing in a command line. If you have used the terminal window in Mac OSx or a DOS prompt in Windows, then you are already familiar with this way of computing. If this idea is new to you, fear not - you already execute the same types of commands whenever you save an image, open a program via an icon, and so on. The next section will explain a little more about how a UNIX environment is set up and how it compares to a personal computer or smartphone environment.

UNIX Architecture

UNIX is made up of 3 main parts: the kernel, the shell, and user commands and applications.

^[a]

http://www.tutorialspoint.com/unix/images/unix_architecture.jpg

The kernel and shell are the heart and soul of the operating system.

The kernel ingests user input via the shell^[b]^[c] and accesses the hardware to perform things like memory allocation and file storage.

The shell is an interface that interprets the command line input and calls the necessary programs to do the work. The commands that you enter are programs themselves, so once the work is done, the command line will return to a prompt and await further input.

One example of how the shell and kernel work together is copying a file. If you want to copy a file named “file1” and name the copy “file2”, you would enter “cp file1 file2” at the command line. The shell will search for the program “cp” and then tell the kernel to run that program on
“file 1” and name the output “file 2”. When the copying is finished, the shell returns you to the prompt and awaits more commands.

UNIX-PC Comparisons

Let’s take a look at another example. Suppose you have a folder called “docs” on your personal computer, phone or stored in a cloud somewhere. Let’s say you have “personal” and “schoolwork” subfolders in there, and that inside your personal folder you have a subfolder with your photos from 2015, and that they’re arranged into monthly subfolders. How do you get into March 2015’s photo area? Easy - you keep clicking on or touching the appropriate folder, until it opens the next, and then the next folders until you can see March 2015 - then you click on it.

Figure: IMAGE SHOWING CLICK THROUGH TO FOLDER MARCH 2015^[d]^[e]

In Unix, you’d simply type the following at the command line to perform the same task: “cd /docs/personal/photos/2015/march”.

_CODE

# cd /docs/personal/photos/2015/march

_END_CODE

Although you can’t see your photos as icons here, the computer is performing exactly the same actions as you did by clicking on all those folders. Additionally, you can then list the photos in the directory, rename them one at a time or all at once, move them to other directories or even other computers, or much more - all by just typing a few words.

The rest of the tutorial will introduce you to the file structures in Unix, how to navigate them, and how to use many common commands and programs to efficiently perform the work you need to accomplish.

Command Line Environment

To ensure that all users have a chance to practice using UNIX regardless of their access to work or school computing resources, we will use an online simulated linux environment created by Fabrice Bellard, at this location: http://bellard.org/jslinux/.

Please note that if the linux simulator page is refreshed or closed, all modifications to files and directories will be lost.

Directory Structure and Navigation

Everything in a UNIX environment spreads outward from a single “root” directory, much like a tree trunk and its branches.

http://www.srh.noaa.gov/rtimages/crp/training/2-file.info.pdf

The root directory is the top level, and is denoted by a slash (/). Other directories are created below the root directory - typically, you will find a “bin” directory, which contains binary files required for commands and processes like those we’ll cover next, as well as a “tmp” directory for temporary files, and directories like “home” that contain information for individual users.

Within the linux simulator we’ll be using, the directory structure looks like this:

Figure: js linux simulator directory structure^[f]

In the online simulator, we are not assigned a user account in the home directory. Instead, we will be working from the “root” subfolder of the “var” directory. Traditionally, the “var” directory is short for “variable” and is used for frequently-changing data. This is fitting given that many internet users, like us, will practice there frequently. The other directories that we will explore in the simulator are read-only and un-editable.

Click through to the linux simulator (which opens in a new window) if you haven’t done so already. If you have already clicked through, refresh the page.

To see that we are indeed located within the “var” directory, which is a subdirectory of “root”, type the letters “pwd” at the prompt in the simulator then press enter.

_CODE

/var/root/ # pwd

/var/root

/var/root/ #

_END_CODE

“pwd” is a command that stands for “print working directory”. You can use this command to find out your location within the unix file structure at any time. Here we can see that we are in “root” which is a subdirectory of “var”, which is a subdirectory of the main root directory, “/”.

Next, let’s find out what’s in our current working directory. List the contents of the current working directory using the list command, “ls”, in the simulator.

What did you find? The simulator should have shown the two entries below:

_CODE

/var/root # ls

dos hello.c

_END_CODE

What do we know about these listed entries? Are they directories or files? Can you edit them, or are they read-only? The ls command by itself simply lists all the contents of a directory. You can add options to it to find out more information.

Try typing “ls -l” into the simulator and press enter. What kind of information are we shown now? The image below shows the standard information that is returned with the -l, long-format option.

https://assets.digitalocean.com/articles/linux_basics/ls-l.png

FIGURE^[g]: standard long format and labels for each field

The mode may be the most important section of the long-format listing. The first column of the mode is the most important for our sake right now. It is the type of entity:

d = directory,
l = link,
- = regular file.

This is only a short listing of some of the common entity types, there are more if you would like to learn about them on your own.

So, in our case the “l” means that dos is a symbolic link (to a dos directory in the / directory), and hello.c is a regular file that is publicly readable (a C program, in fact).

_PANEL

_Title Geek Box: ls options

There are a few other useful/common ls options that you may find useful.

Command and Option	Description
ls -a	List all files, including hidden files
ls -lt	List files sorted by the time last modified
ls -R	List files recursively (descend through all directories and list files from those sub-directories as well)
ls --help	As with most commands, if you add a --help to it, it will return all of the possible options for that command. Note there are two dashes, and no space between them.

_END_PANEL

Now that we know what’s going on in our current working directory, let’s change to another directory to see what’s there. We know that “dos” is a directory, so let’s use that.

Type “cd” followed by a space and the name of a directory you want to change to (dos) into the simulator and press enter. Remember: all commands must be followed by a space before their target file/directory or process!

Now you should be in the dos directory. Notice that the prompt changed to show you the current directory. This is not always the case in UNIX. Explore the contents of the directory with some of the listing commands we introduced earlier and then answer the question below.

_Question_Select

Within the dos directory

there are _____ (0, 1, 2, 3, 4, 5) files
and _____ (0, 1, 2, 3, 4, 5) directories.

_Feedback

You can tell that asm-1.9 is a directory by using the “-l” option and noting that it begins with the letter “d”. Additionally, it shows as a different color in the simulator. Finally, you could try to change directories into it - doing so will work for directories, but not for files.

_END_Feedback

Next let’s change to the asm-1.9 directory. There are several files listed in that directory. In addition to easily listing all the contents in a directory, UNIX allows users to quickly show the contents of individual files. One way to do this is the concatenate command, “cat”. Try entering “cat” followed by one of the filenames in asm-1.9, and then press enter. Readme.txt is an easy one to view (don’t forget a space between the command and filename!)

You should see the contents of readme.txt printed to your screen until the end of the file is reached, like so:

_CODE

....

Format is:
Symbol-Name File-Name Line-No. Number-of-Refs Symbol-Type Value-Hex Value-Dec

To print cross references:
C:> lister -x asm.lst
....
PathSize                        asm.s                2        Equate        0040        64
asm.s        148
asm.s        153
2                references found
...

Format is:
Def: Symbol-Name File-Name Line-No. Number-of-Refs Symbol-Type Value-Hex Value-D
ec
Ref:        File-Name        Line-No.

                                REFERENCES

1. Tannenbaum A S, "Operating Systems : Design and
Implementation", Prentice Hall of India, New Delhi,
1989.

2. Rector R and Alexy G, "The 8086 Book", Osborne/
McGraw-Hill, California, 1980.

~/dos/asm-1.9 #

_END_CODE

Contents of long files can be viewed stepwise by using the “more” command. This is similar to “cat”, but it prints the file contents to screen and allows the user to step through them using the spacebar. To exit more, press “q” for quit.

Finally, let’s move from the asm-1.9 directory back to the dos directory. Try getting there using the cd command.

What did you type? And what did it do?

Since you are in a sub-directory of the directory you’re trying to access, the “cd” command must be used with an absolute path, or an appropriate relative one - we cannot simply type “cd directoryname” like we did before, because the directory we want to access is no longer below our location in the directory structure.

Here’s the error message you would have received if you simply tried “cd dos”:

_CODE

~/dos/asm-1.9 # cd dos

sh: cd: can’t cd to dos

~/dos/asm-1.9 #

_END_CODE

To change back to the “dos” directory, we can use the absolute address “cd /var/root/dos” or we can use a relative path “cd ..” where “..” indicates the directory above your current working directory. “.” is always the current directory. To access our starting/home directory of /var/root from the asm-1.9 directory, we could type “cd ../..” as that directory is two directories up from our current location.

_PANEL

_Title Geek Box: Navigation Tips

For quick navigation and efficient command-line usage, here are a few commands to cut down on your typing.

Command	Description
<TAB>	Before completing a file or directory name in the command line, press TAB to autocomplete the name based on the list of files/directories within this directory.
~	When navigating, this is a synonym for your home directory.
<UP ARROW>	Go chronologically backwards through the previous commands you have run from the command line.
<DOWN ARROW>	Go chronologically forwards through the previous commands you have run from the command line. (Only works after pressing <UP ARROW>)

_END_PANEL

Command	Description	Usage
pwd	print working directory	pwd
ls	list working directory contents	ls
ls -l	list working directory contents with a long-listing	ls -l
cd	change directory	cd directory

Paths	Description	Example
/	root directory if first character or sub-directory if any other character	“cd /” Changes directory to the root of the file system
.	current directory	“ls .” Lists the contents of the current directory (this is implied by typing “ls”)
..	directory one level up from current directory	“cd ..” Changes directory to one level up from current directory

File Management

Now that we know how to navigate the file structures and find out what’s in directories, let’s make and modify some directories and files.

To make a directory, use the “mkdir” command. Let’s start by making a directory called “test” in /var/root

_CODE

/var/root # mkdir test

_END_CODE

List the contents of our current directory to check that “test” was successfully created. Your results should look like this:

_CODE

/var/root # ls
dos hello.c test
/var/root #

_END_CODE

Next, let’s put a file into our new directory. To do that, we can copy or move the hello.c file that is in /var/root. We’ll try two ways.

Option one: we can use the “cp” command to copy hello.c into the test directory while naming it hello2, like shown below. Note that we have to use the relative address “test/” to ensure that hello2 is placed where we want it. If we did not specify this, it would be copied into the current directory.

_CODE

/var/root/ # cp hello.c test/hello2

_END_CODE

Option two: we can redirect the content of hello.c to a new file named hello3 using the “cat” and redirection commands. Redirecting the output from one command into another file is done with the “>” command. You can think of the greater than sign as a funnel to push contents from the cat command into the container/file on the other end. Try it using the code below:

_CODE

/var/root/ # cat hello.c > test/hello3

_END_CODE

Now, navigate to the test directory, and check that hello2 and hello3 have been created.

_Question_radio

True or False: the contents of hello2 and hello3 are the identical

True
False

_Feedback

You can use the “cat” or “more” commands on hello2 and hello3 to verify that they are exactly the same. We copied the file the first time, and then printed all its contents into a new filename the second time, so they should be identical.

_END_Feedback

_PANEL

_Title Geek Box: Diff

There is an easy way to truly tell the difference between two files: diff. You can use diff followed by two file names to check the difference between them. The differences will be listed individually. In this case, the command would be:

_CODE

/var/root # diff hello2 hello3

/var/root #

_END_CODE

Because the command didn’t return any output, the files are exactly the same.

_END_PANEL

In addition to copying, we can rename or “move” the contents of one file to another filename. Type “mv hello3 hello4” into the simulator from the test directory you created previously, then list test’s contents again.

_Question_radio

How many files should now exist in the test directory?

_Feedback

When you moved hello3 to hello4, it did not create a copy, it simply moved the file from one name to another in your directory. Thus, there should be two files: hello2 and hello4. This is the same thing that you might think of when renaming a file in another computing environment.

_END_Feedback

You should now be comfortable creating, moving and copying files. What about removing files?

To remove a file, we use the “rm” command. Use this command to remove hello4, and be sure to check that it has been completed.

_CODE

/var/root/test # ls

hello2 hello4

/var/root/test # rm hello4

/var/root/test # ls

hello2

/var/root/test #

_END_CODE

If you want to remove an entire directory, you can use the rmdir command. Navigate back to the parent directory and try it on the test directory by typing “rmdir test”.

_Question_Radio

Why do you think this didn’t work?

The directory isn’t there.
The directory isn’t empty.
The directory you are trying to remove is your current directory.
The directory you are trying to remove isn’t owned by you.

_Feedback

The contents had to be removed first. UNIX will only allow you to remove empty directories using the rmdir command. It may seem like a hassle, but it does ensure that you really want to delete the contents of a directory, since you have to go through the effort of deleting all other materials first.

_END_Feedback

_PANEL

_Title Geek Box: rm -r and rm -i

Removing directories is best done with rmdir for safety’s sake. However, you can remove files quickly from within a directory by recursively removing a directory’s contents with “rm -r directory_name”. This does NOT remove the directory itself, just the files within it. Always be careful when removing like this as there is no recovering from deleting files you meant to keep. If you want to play it safer, you can make “rm” interactive so it will check with you each time to make sure you want to remove that file before proceeding. To make “rm” interactive, use the -i command.

_CODE

/var/root/ # rm -ri test

rm: descend into directory ‘test’?

/var/root/

_END_CODE

You can respond to the query by either typing

y or yes to proceed,
n or no to skip that file.

_END_PANEL

Now, try removing hello2, and then remove the entire directory.

Command	Description	Usage
cp	Copy file1 from directory1 to directory2 optionally renaming the file in the process.	cp directory1/file1 directory2/file2
mv	Move (similar to cut) file1 from directory1 to directory2 optionally renaming the file in the process.	mv directory1/file1 directory2/file2
rm	Remove a file from a directory	rm directory1/file1
rmdir	Remove an empty directory	rmdir directory

Searching

Thus far we’ve explored only a handful of files in a couple directories. Now, let’s move to looking at larger amounts of files and those from which you may need specific information.

Let’s say you wanted to copy all of the “.txt” files from one directory to another, but there were several different file types present in that directory. In a GUI (graphical user interface) environment, you would probably sort the file list by type/extension and then just select the desired files, which are now listed in a block. In a UNIX environment, you can find, list, sort and copy all the files of one type by using wildcards. These are special characters that can be used like wild cards in a card game - they can be anything you want them to be.

Navigate to the “asm-1.9” directory under /var/root/dos. To list all of the .txt files only, you would enter your command as:

_CODE

~/dos/asm-1.9 # ls *.txt

_END_CODE

The wildcard in this case is the special character “*”. This wildcard represents any number of characters, digits, or whitespace followed by the last 4 characters being exactly “.txt”.

_Question_checkbox

What files were returned from this ls command based on the following list of files and directories? Select all that apply.

_CODE

        ~/dos/asm-1.9 # ls
        Changelog        display.s        expr.s                lister.s                readme.txt        symtab.i
        asm.s                dos.i                input.s                message.s        support.s        symtab.s
        direct.s                equ.s                license.txt        output.s        symbols.s

_END_CODE

Changelog
lister.s
readme.txt
license.txt
output.s

_Feedback

Any filename that ends with .txt would be returned from this command except for a file called “.txt” (without the quotes) since it doesn’t have any characters, digits, punctuation marks, or whitespace before the “.”.

_END_Feedback

There are many different wildcards available in UNIX. Some common wildcards are:

* (any non-zero number of characters, digits, punctuation marks, or whitespaces)
? (any single character, digit, punctuation, or whitespace)
[ ] (a user-defined range of characters, digits, punctuation, or whitespace that takes up one space)

Let’s do some examples using wildcards.

_Question_Select

From the following list, how many files would be returned with each ls command?

_CODE

_END_CODE

ls [a-g].s _____ (0, 1, 5, 6, 7, 12, 17)
ls [a-g]??.s _____ (0, 1, 2, 3, 4, 5)
ls [a-g]*.s _____ (0, 1, 5, 6, 7, 12, 17)
ls *.s _____ (0, 1, 5, 6, 7, 12, 17)
ls * _____ (0, 1, 5, 6, 7, 12, 17)

_Feedback

The coded answers for each of the previous examples are below.

_CODE

~/asm-1.9 # ls [a-g].s

~/asm-1.9 #

_END_CODE

_CODE

~/asm-1.9 # ls [a-g]??.s

asm.s equ.s

~/asm-1.9 #

_END_CODE

_CODE

~/asm-1.9 # ls [a-g]*.s

asm.s direct.s display.s equ.s expr.s

~/asm-1.9 #

_END_CODE

_CODE

~/asm-1.9 # ls *.s

asm.s display.s expr.s lister.s output.s symbols.s
direct.s equ.s input.s message.s support.s symtab.s

~/asm-1.9 #

_END_CODE

_CODE

~/dos/asm-1.9 # ls *
Changelog        display.s        expr.s                lister.s                readme.txt        symtab.i
asm.s                dos.i                input.s                message.s        support.s        symtab.s
direct.s                equ.s                license.txt        output.s        symbols.s

~/dos/asm-1.9 #

_END_CODE

_END_Feedback

_PANEL

_Title Geek Box: case sensitivity

If you wanted to list all the files that started with an A through G (capitals matter!), you could do that within a bracketed list like so:

_CODE

~/dos/asm-1.9 # ls [A-G]*

_END_CODE

This will display a file listing that returns one filename: Changelog.

If you wanted the “ls” command to return both capital and lowercase letters, you would need to include both of them within the brackets separated by a comma.

_CODE

~/dos/asm-1.9 # ls [a-g,A-G]*

_END_CODE

_END_PANEL

Using wildcards is great if you know where certain types of files are located. But what about if you don’t know where a file is, but remember part of its name? You can still use wildcards, but you will need more functionality than just “ls”.

Using the “find” command, you can find those missing files using wildcards. And, searches with find will search the current directory and any sub-directories. Be careful when doing this if you have a lot of sub-directories containing many files, as it can take a very long time to search all of the content.

The syntax for the find command is as follows:

find -name search_string

where -name indicates that it will search for the name of the file.

Here’s an example of a find command.

_CODE

~/dos # find -name ‘a*’

./asm.com

./asm-1.9

./asm-1.9/asm.s

~/dos #

_END_CODE

Notice that the command returned both files and directories and even files in sub-directories from where the command was run. These are all the files that start with an “a” and are any length longer than just “a”. So we found the files we were looking for that started with “a”.

What if you didn’t know the name of the file, but remembered something within the file? To find something within a file, you can use the command “grep”, which stands for Global Regular Expression Print. Grep follows the syntax below:

grep search_expression file_to_search

As an example, let’s search through the file hello.c within your home directory to see if it contains the string “int”.

_CODE

~/var/root # grep int hello.c

int main(int, argc, char **argv)

printf("Hello World\n");

~/var/root #

_END_CODE

Grep found two instances of that string. Notice though that the lines that are returned aren’t looking for the word “int”, they are looking for the string “int”, which can be inside a word. This is the reason that “printf” is returned in the example - it contains “int” inside the word.

_PANEL

_Title Geek Box: grep -w

If you wanted to only return instances where the pattern was a word, you can add the “-w” option after grep.

_CODE

~/var/root # grep -w int hello.c

int main(int, argc, char **argv)

~/var/root #

_END_CODE

_END_PANEL

What if you didn’t know the name of the file that had the string within it? You could still find all the files that contain that string with grep but you could use a wildcard in place of the file_to_search.

Let’s say we wanted to search for ALL files below your dos directory that contained the string “print”. The following code would show you those files and print the lines which have the “print” pattern on them. The added code here is the “-r” option which digs recursively downward from your current directory giving the results below.

_CODE

~/dos # grep -r print *
asm-1.9/equ.s: call SprintRegister
asm-1.9/message.s:| BX points to the message to be printed
asm-1.9/message.s:|The procedure print 'asm :', the message, a carriage return a
nd a line feed
asm-1.9/symtab.s: call SprintRegister
asm-1.9/symtab.s:SprintRegister:
asm-1.9/symtab.s:SprintRegisterMore:
asm-1.9/symtab.s: call SprintHexDigit
asm-1.9/symtab.s: jnz SprintRegisterMore
asm-1.9/symtab.s:SprintHexDigit:
asm-1.9/lister.s:|Lister - print the symbol table of the assembler from the list
file.
asm-1.9/readme.txt:The contents of the symbol table are printed out at the end o
f the
asm-1.9/readme.txt:Only one of -x or -z must be specified. The -x option prints
a
asm-1.9/readme.txt:complete xref dump (definitions + references) The -z option p
rints a
asm-1.9/readme.txt:To print labels not referenced
asm-1.9/readme.txt:To print all defined symbols:
asm-1.9/readme.txt:To print crossreferences:
asm-1.9/Changelog:1. Instead of printing the symbol table onto the screen it put
s
asm-1.9/Changelog:7. A separate program lister was added which prints out the sy
mbol
asm-1.9/Changelog:2. The print stats function was removed
asm-1.9/Changelog:7. Doesn't print the name of the file that it is assembling an
y longer.
~/dos #

_END_CODE

Command	Description	Usage
find	Find files recursively by their file name and list them.	find -name string_or_wildcard
grep	Find files by their contents and display the line from each file that contains that search string.	grep search_string file

Wildcard	Description	Example
*	any non-zero number of characters, digits, punctuation marks, or whitespaces	ls .jpg ls file ls in
?	any single character, digit, punctuation mark, or whitespace	ls photo?.jpg ls ?ilename.txt ls test?file.txt
[...]	A user-defined range of characters, digits, punctuation, or whitespace that takes up one space	ls file[0-9].jpg ls [a-z]ile.txt ls file[_, ,.]name.txt

_EXTRA

Find all files within the current working directory that contain .com.

Then find all files starting from your home directory that have ‘bin’ in the name.

_END_EXTRA

Summary

Andrea will write summary

[a]need copyright-ready version of this from anotehr source

[b]Not sure it is clear that the shell is the command line (in a way). This seems like a disconnect that has been drawn. Do you want that to be there or not?

[c]Could just change this to shell/command line, or might have to write up a little bit on what the shell is before talking about the graphic.

[d]Need this image to be made or make a video of it. I can make a video is needed from my Mac, but that might not be the ideal solution for these students. Let me know.

[e]yes, video

[f]figure

[g]Mode makes sense when you think about the command to change the mode of the files or directories "chmod", but otherwise it seems like a stupid terminology.