Directory Structure and Navigation
This tutorial will help those new to computing in the geosciences become familiar with working in a command line environment. Here you will learn about the basics of UNIX file structures, how to navigate in a UNIX environment, and you’ll get to practice creating, storing and searching for files.
caption: Dennis Ritchie (standing) and Ken Thompson working on a PDP-11 minicomputer
UNIX is an operating system that can handle multiple users and processes at the same time. Dennis Ritchie and Ken Thompson of AT&T’s Bell Labs developed Unix in the late 1960s and early 1970s. UNIX forms the basis for Linux, Apple’s Mac OSx, Android, and most other tech that you know and love, including all versions of Microsoft Windows. Expand the optional Geek Box below if you want an in-depth look at the many Unix variants over the years.
_PANEL
_Title Origins of UNIX
https://commons.wikimedia.org/wiki/File:Unix_history-simple.svg
Flowchart of Unix origins and subsequent variants over the years.
_END_PANEL
Work in a UNIX environment is accomplished through typing in a command line. If you have used the terminal window in Mac OSx or a DOS prompt in Windows, then you are already familiar with this way of computing. If this idea is new to you, fear not - you already execute the same types of commands whenever you save an image, open a program via an icon, and so on. The next section will explain a little more about how a UNIX environment is set up and how it compares to a personal computer or smartphone environment.
UNIX is made up of 3 main parts: the kernel, the shell, and user commands and applications.
http://www.tutorialspoint.com/unix/images/unix_architecture.jpg
The kernel and shell are the heart and soul of the operating system.
The kernel ingests user input via the shell[b][c] and accesses the hardware to perform things like memory allocation and file storage.
The shell is an interface that interprets the command line input and calls the necessary programs to do the work. The commands that you enter are programs themselves, so once the work is done, the command line will return to a prompt and await further input.
One example of how the shell and kernel work together is copying a file. If you want to copy a file named “file1” and name the copy “file2”, you would enter “cp file1 file2” at the command line. The shell will search for the program “cp” and then tell the kernel to run that program on
“file 1” and name the output “file 2”. When the copying is finished, the shell returns you to the prompt and awaits more commands.
Let’s take a look at another example. Suppose you have a folder called “docs” on your personal computer, phone or stored in a cloud somewhere. Let’s say you have “personal” and “schoolwork” subfolders in there, and that inside your personal folder you have a subfolder with your photos from 2015, and that they’re arranged into monthly subfolders. How do you get into March 2015’s photo area? Easy - you keep clicking on or touching the appropriate folder, until it opens the next, and then the next folders until you can see March 2015 - then you click on it.
Figure: IMAGE SHOWING CLICK THROUGH TO FOLDER MARCH 2015[d][e]
In Unix, you’d simply type the following at the command line to perform the same task: “cd /docs/personal/photos/2015/march”.
_CODE
# cd /docs/personal/photos/2015/march
_END_CODE
Although you can’t see your photos as icons here, the computer is performing exactly the same actions as you did by clicking on all those folders. Additionally, you can then list the photos in the directory, rename them one at a time or all at once, move them to other directories or even other computers, or much more - all by just typing a few words.
The rest of the tutorial will introduce you to the file structures in Unix, how to navigate them, and how to use many common commands and programs to efficiently perform the work you need to accomplish.
To ensure that all users have a chance to practice using UNIX regardless of their access to work or school computing resources, we will use an online simulated linux environment created by Fabrice Bellard, at this location: http://bellard.org/jslinux/.
Please note that if the linux simulator page is refreshed or closed, all modifications to files and directories will be lost.
Everything in a UNIX environment spreads outward from a single “root” directory, much like a tree trunk and its branches.
http://www.srh.noaa.gov/rtimages/crp/training/2-file.info.pdf
The root directory is the top level, and is denoted by a slash (/). Other directories are created below the root directory - typically, you will find a “bin” directory, which contains binary files required for commands and processes like those we’ll cover next, as well as a “tmp” directory for temporary files, and directories like “home” that contain information for individual users.
Within the linux simulator we’ll be using, the directory structure looks like this:
Figure: js linux simulator directory structure[f]
In the online simulator, we are not assigned a user account in the home directory. Instead, we will be working from the “root” subfolder of the “var” directory. Traditionally, the “var” directory is short for “variable” and is used for frequently-changing data. This is fitting given that many internet users, like us, will practice there frequently. The other directories that we will explore in the simulator are read-only and un-editable.
Click through to the linux simulator (which opens in a new window) if you haven’t done so already. If you have already clicked through, refresh the page.
To see that we are indeed located within the “var” directory, which is a subdirectory of “root”, type the letters “pwd” at the prompt in the simulator then press enter.
_CODE
/var/root/ # pwd
/var/root
/var/root/ #
_END_CODE
“pwd” is a command that stands for “print working directory”. You can use this command to find out your location within the unix file structure at any time. Here we can see that we are in “root” which is a subdirectory of “var”, which is a subdirectory of the main root directory, “/”.
Next, let’s find out what’s in our current working directory. List the contents of the current working directory using the list command, “ls”, in the simulator.
What did you find? The simulator should have shown the two entries below:
_CODE
/var/root # ls
dos hello.c
_END_CODE
What do we know about these listed entries? Are they directories or files? Can you edit them, or are they read-only? The ls command by itself simply lists all the contents of a directory. You can add options to it to find out more information.
Try typing “ls -l” into the simulator and press enter. What kind of information are we shown now? The image below shows the standard information that is returned with the -l, long-format option.
https://assets.digitalocean.com/articles/linux_basics/ls-l.png
FIGURE[g]: standard long format and labels for each field
The mode may be the most important section of the long-format listing. The first column of the mode is the most important for our sake right now. It is the type of entity:
This is only a short listing of some of the common entity types, there are more if you would like to learn about them on your own.
So, in our case the “l” means that dos is a symbolic link (to a dos directory in the / directory), and hello.c is a regular file that is publicly readable (a C program, in fact).
_PANEL
_Title Geek Box: ls options
There are a few other useful/common ls options that you may find useful.
Command and Option | Description |
ls -a | List all files, including hidden files |
ls -lt | List files sorted by the time last modified |
ls -R | List files recursively (descend through all directories and list files from those sub-directories as well) |
ls --help | As with most commands, if you add a --help to it, it will return all of the possible options for that command. Note there are two dashes, and no space between them. |
_END_PANEL
Now that we know what’s going on in our current working directory, let’s change to another directory to see what’s there. We know that “dos” is a directory, so let’s use that.
Type “cd” followed by a space and the name of a directory you want to change to (dos) into the simulator and press enter. Remember: all commands must be followed by a space before their target file/directory or process!
Now you should be in the dos directory. Notice that the prompt changed to show you the current directory. This is not always the case in UNIX. Explore the contents of the directory with some of the listing commands we introduced earlier and then answer the question below.
_Question_Select
Within the dos directory
_Feedback
You can tell that asm-1.9 is a directory by using the “-l” option and noting that it begins with the letter “d”. Additionally, it shows as a different color in the simulator. Finally, you could try to change directories into it - doing so will work for directories, but not for files.
_END_Feedback
Next let’s change to the asm-1.9 directory. There are several files listed in that directory. In addition to easily listing all the contents in a directory, UNIX allows users to quickly show the contents of individual files. One way to do this is the concatenate command, “cat”. Try entering “cat” followed by one of the filenames in asm-1.9, and then press enter. Readme.txt is an easy one to view (don’t forget a space between the command and filename!)
You should see the contents of readme.txt printed to your screen until the end of the file is reached, like so:
_CODE
....
Format is:
Symbol-Name File-Name Line-No. Number-of-Refs Symbol-Type Value-Hex Value-Dec
To print cross references:
C:> lister -x asm.lst
....
PathSize asm.s 2 Equate 0040 64
asm.s 148
asm.s 153
2 references found
...
Format is:
Def: Symbol-Name File-Name Line-No. Number-of-Refs Symbol-Type Value-Hex Value-D
ec
Ref: File-Name Line-No.
REFERENCES
1. Tannenbaum A S, "Operating Systems : Design and
Implementation", Prentice Hall of India, New Delhi,
1989.
2. Rector R and Alexy G, "The 8086 Book", Osborne/
McGraw-Hill, California, 1980.
~/dos/asm-1.9 #
_END_CODE
Contents of long files can be viewed stepwise by using the “more” command. This is similar to “cat”, but it prints the file contents to screen and allows the user to step through them using the spacebar. To exit more, press “q” for quit.
Finally, let’s move from the asm-1.9 directory back to the dos directory. Try getting there using the cd command.
What did you type? And what did it do?
Since you are in a sub-directory of the directory you’re trying to access, the “cd” command must be used with an absolute path, or an appropriate relative one - we cannot simply type “cd directoryname” like we did before, because the directory we want to access is no longer below our location in the directory structure.
Here’s the error message you would have received if you simply tried “cd dos”:
_CODE
~/dos/asm-1.9 # cd dos
sh: cd: can’t cd to dos
~/dos/asm-1.9 #
_END_CODE
To change back to the “dos” directory, we can use the absolute address “cd /var/root/dos” or we can use a relative path “cd ..” where “..” indicates the directory above your current working directory. “.” is always the current directory. To access our starting/home directory of /var/root from the asm-1.9 directory, we could type “cd ../..” as that directory is two directories up from our current location.
_PANEL
_Title Geek Box: Navigation Tips
For quick navigation and efficient command-line usage, here are a few commands to cut down on your typing.
Command | Description |
<TAB> | Before completing a file or directory name in the command line, press TAB to autocomplete the name based on the list of files/directories within this directory. |
~ | When navigating, this is a synonym for your home directory. |
<UP ARROW> | Go chronologically backwards through the previous commands you have run from the command line. |
<DOWN ARROW> | Go chronologically forwards through the previous commands you have run from the command line. (Only works after pressing <UP ARROW>) |
_END_PANEL
Command | Description | Usage |
pwd | print working directory | pwd |
ls | list working directory contents | ls |
ls -l | list working directory contents with a long-listing | ls -l |
cd | change directory | cd directory |
Paths | Description | Example |
/ | root directory if first character or sub-directory if any other character | “cd /” Changes directory to the root of the file system |
. | current directory | “ls .” Lists the contents of the current directory (this is implied by typing “ls”) |
.. | directory one level up from current directory | “cd ..” Changes directory to one level up from current directory |
Now that we know how to navigate the file structures and find out what’s in directories, let’s make and modify some directories and files.
To make a directory, use the “mkdir” command. Let’s start by making a directory called “test” in /var/root
_CODE
/var/root # mkdir test
_END_CODE
List the contents of our current directory to check that “test” was successfully created. Your results should look like this:
_CODE
/var/root # ls
dos hello.c test
/var/root #
_END_CODE
Next, let’s put a file into our new directory. To do that, we can copy or move the hello.c file that is in /var/root. We’ll try two ways.
Option one: we can use the “cp” command to copy hello.c into the test directory while naming it hello2, like shown below. Note that we have to use the relative address “test/” to ensure that hello2 is placed where we want it. If we did not specify this, it would be copied into the current directory.
_CODE
/var/root/ # cp hello.c test/hello2
_END_CODE
Option two: we can redirect the content of hello.c to a new file named hello3 using the “cat” and redirection commands. Redirecting the output from one command into another file is done with the “>” command. You can think of the greater than sign as a funnel to push contents from the cat command into the container/file on the other end. Try it using the code below:
_CODE
/var/root/ # cat hello.c > test/hello3
_END_CODE
Now, navigate to the test directory, and check that hello2 and hello3 have been created.
_Question_radio
True or False: the contents of hello2 and hello3 are the identical
_Feedback
You can use the “cat” or “more” commands on hello2 and hello3 to verify that they are exactly the same. We copied the file the first time, and then printed all its contents into a new filename the second time, so they should be identical.
_END_Feedback
_PANEL
_Title Geek Box: Diff
There is an easy way to truly tell the difference between two files: diff. You can use diff followed by two file names to check the difference between them. The differences will be listed individually. In this case, the command would be:
_CODE
/var/root # diff hello2 hello3
/var/root #
_END_CODE
Because the command didn’t return any output, the files are exactly the same.
_END_PANEL
In addition to copying, we can rename or “move” the contents of one file to another filename. Type “mv hello3 hello4” into the simulator from the test directory you created previously, then list test’s contents again.
_Question_radio
How many files should now exist in the test directory?
_Feedback
When you moved hello3 to hello4, it did not create a copy, it simply moved the file from one name to another in your directory. Thus, there should be two files: hello2 and hello4. This is the same thing that you might think of when renaming a file in another computing environment.
_END_Feedback
You should now be comfortable creating, moving and copying files. What about removing files?
To remove a file, we use the “rm” command. Use this command to remove hello4, and be sure to check that it has been completed.
_CODE
/var/root/test # ls
hello2 hello4
/var/root/test # rm hello4
/var/root/test # ls
hello2
/var/root/test #
_END_CODE
If you want to remove an entire directory, you can use the rmdir command. Navigate back to the parent directory and try it on the test directory by typing “rmdir test”.
_Question_Radio
Why do you think this didn’t work?
_Feedback
The contents had to be removed first. UNIX will only allow you to remove empty directories using the rmdir command. It may seem like a hassle, but it does ensure that you really want to delete the contents of a directory, since you have to go through the effort of deleting all other materials first.
_END_Feedback
_PANEL
_Title Geek Box: rm -r and rm -i
Removing directories is best done with rmdir for safety’s sake. However, you can remove files quickly from within a directory by recursively removing a directory’s contents with “rm -r directory_name”. This does NOT remove the directory itself, just the files within it. Always be careful when removing like this as there is no recovering from deleting files you meant to keep. If you want to play it safer, you can make “rm” interactive so it will check with you each time to make sure you want to remove that file before proceeding. To make “rm” interactive, use the -i command.
_CODE
/var/root/ # rm -ri test
rm: descend into directory ‘test’?
/var/root/
_END_CODE
You can respond to the query by either typing
_END_PANEL
Now, try removing hello2, and then remove the entire directory.
Command | Description | Usage |
cp | Copy file1 from directory1 to directory2 optionally renaming the file in the process. | cp directory1/file1 directory2/file2 |
mv | Move (similar to cut) file1 from directory1 to directory2 optionally renaming the file in the process. | mv directory1/file1 directory2/file2 |
rm | Remove a file from a directory | rm directory1/file1 |
rmdir | Remove an empty directory | rmdir directory |
Thus far we’ve explored only a handful of files in a couple directories. Now, let’s move to looking at larger amounts of files and those from which you may need specific information.
Let’s say you wanted to copy all of the “.txt” files from one directory to another, but there were several different file types present in that directory. In a GUI (graphical user interface) environment, you would probably sort the file list by type/extension and then just select the desired files, which are now listed in a block. In a UNIX environment, you can find, list, sort and copy all the files of one type by using wildcards. These are special characters that can be used like wild cards in a card game - they can be anything you want them to be.
Navigate to the “asm-1.9” directory under /var/root/dos. To list all of the .txt files only, you would enter your command as:
_CODE
~/dos/asm-1.9 # ls *.txt
_END_CODE
The wildcard in this case is the special character “*”. This wildcard represents any number of characters, digits, or whitespace followed by the last 4 characters being exactly “.txt”.
_Question_checkbox
What files were returned from this ls command based on the following list of files and directories? Select all that apply.
_CODE
~/dos/asm-1.9 # ls
Changelog display.s expr.s lister.s readme.txt symtab.i
asm.s dos.i input.s message.s support.s symtab.s
direct.s equ.s license.txt output.s symbols.s
_END_CODE
_Feedback
Any filename that ends with .txt would be returned from this command except for a file called “.txt” (without the quotes) since it doesn’t have any characters, digits, punctuation marks, or whitespace before the “.”.
_END_Feedback
There are many different wildcards available in UNIX. Some common wildcards are:
Let’s do some examples using wildcards.
_Question_Select
From the following list, how many files would be returned with each ls command?
_CODE
~/dos/asm-1.9 # ls
Changelog display.s expr.s lister.s readme.txt symtab.i
asm.s dos.i input.s message.s support.s symtab.s
direct.s equ.s license.txt output.s symbols.s
_END_CODE
_Feedback
The coded answers for each of the previous examples are below.
_CODE
~/asm-1.9 # ls [a-g].s
~/asm-1.9 #
_END_CODE
_CODE
~/asm-1.9 # ls [a-g]??.s
asm.s equ.s
~/asm-1.9 #
_END_CODE
_CODE
~/asm-1.9 # ls [a-g]*.s
asm.s direct.s display.s equ.s expr.s
~/asm-1.9 #
_END_CODE
_CODE
~/asm-1.9 # ls *.s
asm.s display.s expr.s lister.s output.s symbols.s
direct.s equ.s input.s message.s support.s symtab.s
~/asm-1.9 #
_END_CODE
_CODE
~/dos/asm-1.9 # ls *
Changelog display.s expr.s lister.s readme.txt symtab.i
asm.s dos.i input.s message.s support.s symtab.s
direct.s equ.s license.txt output.s symbols.s
~/dos/asm-1.9 #
_END_CODE
_END_Feedback
_PANEL
_Title Geek Box: case sensitivity
If you wanted to list all the files that started with an A through G (capitals matter!), you could do that within a bracketed list like so:
_CODE
~/dos/asm-1.9 # ls [A-G]*
_END_CODE
This will display a file listing that returns one filename: Changelog.
If you wanted the “ls” command to return both capital and lowercase letters, you would need to include both of them within the brackets separated by a comma.
_CODE
~/dos/asm-1.9 # ls [a-g,A-G]*
_END_CODE
_END_PANEL
Using wildcards is great if you know where certain types of files are located. But what about if you don’t know where a file is, but remember part of its name? You can still use wildcards, but you will need more functionality than just “ls”.
Using the “find” command, you can find those missing files using wildcards. And, searches with find will search the current directory and any sub-directories. Be careful when doing this if you have a lot of sub-directories containing many files, as it can take a very long time to search all of the content.
The syntax for the find command is as follows:
find -name search_string
where -name indicates that it will search for the name of the file.
Here’s an example of a find command.
_CODE
~/dos # find -name ‘a*’
./asm.com
./asm-1.9
./asm-1.9/asm.s
~/dos #
_END_CODE
Notice that the command returned both files and directories and even files in sub-directories from where the command was run. These are all the files that start with an “a” and are any length longer than just “a”. So we found the files we were looking for that started with “a”.
What if you didn’t know the name of the file, but remembered something within the file? To find something within a file, you can use the command “grep”, which stands for Global Regular Expression Print. Grep follows the syntax below:
grep search_expression file_to_search
As an example, let’s search through the file hello.c within your home directory to see if it contains the string “int”.
_CODE
~/var/root # grep int hello.c
int main(int, argc, char **argv)
printf("Hello World\n");
~/var/root #
_END_CODE
Grep found two instances of that string. Notice though that the lines that are returned aren’t looking for the word “int”, they are looking for the string “int”, which can be inside a word. This is the reason that “printf” is returned in the example - it contains “int” inside the word.
_PANEL
_Title Geek Box: grep -w
If you wanted to only return instances where the pattern was a word, you can add the “-w” option after grep.
_CODE
~/var/root # grep -w int hello.c
int main(int, argc, char **argv)
~/var/root #
_END_CODE
_END_PANEL
What if you didn’t know the name of the file that had the string within it? You could still find all the files that contain that string with grep but you could use a wildcard in place of the file_to_search.
Let’s say we wanted to search for ALL files below your dos directory that contained the string “print”. The following code would show you those files and print the lines which have the “print” pattern on them. The added code here is the “-r” option which digs recursively downward from your current directory giving the results below.
_CODE
~/dos # grep -r print *
asm-1.9/equ.s: call SprintRegister
asm-1.9/message.s:| BX points to the message to be printed
asm-1.9/message.s:|The procedure print 'asm :', the message, a carriage return a
nd a line feed
asm-1.9/symtab.s: call SprintRegister
asm-1.9/symtab.s:SprintRegister:
asm-1.9/symtab.s:SprintRegisterMore:
asm-1.9/symtab.s: call SprintHexDigit
asm-1.9/symtab.s: jnz SprintRegisterMore
asm-1.9/symtab.s:SprintHexDigit:
asm-1.9/lister.s:|Lister - print the symbol table of the assembler from the list
file.
asm-1.9/readme.txt:The contents of the symbol table are printed out at the end o
f the
asm-1.9/readme.txt:Only one of -x or -z must be specified. The -x option prints
a
asm-1.9/readme.txt:complete xref dump (definitions + references) The -z option p
rints a
asm-1.9/readme.txt:To print labels not referenced
asm-1.9/readme.txt:To print all defined symbols:
asm-1.9/readme.txt:To print crossreferences:
asm-1.9/Changelog:1. Instead of printing the symbol table onto the screen it put
s
asm-1.9/Changelog:7. A separate program lister was added which prints out the sy
mbol
asm-1.9/Changelog:2. The print stats function was removed
asm-1.9/Changelog:7. Doesn't print the name of the file that it is assembling an
y longer.
~/dos #
_END_CODE
Command | Description | Usage |
find | Find files recursively by their file name and list them. | find -name string_or_wildcard |
grep | Find files by their contents and display the line from each file that contains that search string. | grep search_string file |
Wildcard | Description | Example |
* | any non-zero number of characters, digits, punctuation marks, or whitespaces | ls *.jpg ls file* ls *in* |
? | any single character, digit, punctuation mark, or whitespace | ls photo?.jpg ls ?ilename.txt ls test?file.txt |
[...] | A user-defined range of characters, digits, punctuation, or whitespace that takes up one space | ls file[0-9].jpg ls [a-z]ile.txt ls file[_, ,.]name.txt |
_EXTRA
Find all files within the current working directory that contain .com.
Then find all files starting from your home directory that have ‘bin’ in the name.
_END_EXTRA
Andrea will write summary
[a]need copyright-ready version of this from anotehr source
[b]Not sure it is clear that the shell is the command line (in a way). This seems like a disconnect that has been drawn. Do you want that to be there or not?
[c]Could just change this to shell/command line, or might have to write up a little bit on what the shell is before talking about the graphic.
[d]Need this image to be made or make a video of it. I can make a video is needed from my Mac, but that might not be the ideal solution for these students. Let me know.
[e]yes, video
[f]figure
[g]Mode makes sense when you think about the command to change the mode of the files or directories "chmod", but otherwise it seems like a stupid terminology.