Background

At a high level, we have seen that computers do four things:


We are all familiar with graphical user interfaces (GUI): where a “click” translates easily into “do the thing I want”.

If you wish to do complex, purpose-specific things it helps to have a richer means of expressing your instructions to the computer. It doesn’t need to be complicated or difficult, just a vocabulary of commands and a simple grammar for using them.

This is what the shell provides - a simple language and a command-line interface to use it through.

The Shell

A shell is a program like any other. What’s special about it is that its job is to run other programs rather than to do calculations itself. The most popular Unix shell is Bash, the Bourne Again SHell (so-called because it’s derived from a shell written by Stephen Bourne and because programers make these kinds of jokes).

What does it look like?

A typical shell window looks something like this:

bash-3.2$
bash-3.2$ ls -F /
Applications/         System/
Library/              Users/
Network/              Volumes/
bash-3.2$

The first line shows only a prompt, indicating that the shell is waiting for input. Your shell may use different text for the prompt. Most importantly: when typing commands, either from these lessons or from other sources, do not type the prompt, only the commands that follow it.

The part that you type, ls -F / in the second line of the example, typically has the following structure: a command, some flags (also called options or switches) and an argument. Flags start with a single dash (-) or two dashes (--), and change the behaviour of a command. Arguments tell the command what to operate on (e.g. files and directories).

Next we see the output that our command produced: in this case it is a listing of files and folders in a location called / - we’ll cover that in the next section.

Finally, the shell prints the prompt again and waits for you to type the next command.

In the examples for this lesson, we’ll show the prompt as $. Do not type $ when typing in the commands.

Open a shell window and try executing ls -F / for yourself (don’t forget that spaces and capitalization are important!). You can change the prompt too, if you like.

Lets all connect to the Mammouth Supercomputer

Open a command line terminal and log in to Mammouth.

ssh <YOUR_USERNAME>@mp2.ccs.usherbrooke.ca
# or
ssh -l <YOUR_USERNAME> mp2.ccs.usherbrooke.ca

For those on older windows, you can use putty:


You should now be logged to Mammouth, the Supercomputer located at Sherbrooke University, and see this prompt:

[poq@ip17-mp2 ~]$

If you are not connected right now, please stop me! We will send the cavalry to help

Where am I & who am I?

You now have a terminal connected to a remote computer, but that terminal could for some reason disconnect from Mammouth or you could simply have a second terminal open that is connected to your laptop.

If you ever need to be certain which system a terminal you are using is connected to then use:

> Test the following command: $ hostname

You might also what to know who you are!

> Test the following command: $ whoami

What are these two commands telling you?


Exploring the File system

First, let’s find out where we are by running a command called pwd (which stands for “print working directory”). Directories are like places; Linux’s version of “Folders”. At any moment in time, while using the shell, we are in exactly one place, called our current working directory. Commands mostly read and write files in the current working directory, i.e. “here”, so knowing where you are before running a command is important. pwd shows you where you are:

$ pwd
/home/guest33

Here, the computer’s response is /home/guest33, which is the home directory, what is your home directory path?

Home Directory Variation

The home directory path will look different on different operating systems. On OSX it may look like /Users/guest33, and on Windows it will be similar to C:\Documents and Settings\guest33 or C:\Users\guest33.

To understand what a “home directory” is, let’s have a look at how the file system as a whole is organized. For the sake of this example, we’ll be illustrating the filesystem with the user myuser seen in the image below. After this illustration, you’ll be learning commands to explore your own filesystem, which will be constructed in a similar way, but not exactly identical.

Here is what the myuser filesystem looks like:

At the top is the root directory that holds everything else. We refer to it using a slash character, /, on its own; this is the leading slash in /home/myuser.

Inside that directory are several other directories:
- bin (which is where some built-in programs are stored)
- etc (for miscellaneous, typically configuration files)
- home (where users’ personal directories are located)

We know that our current working directory /home/myuser is stored inside /home because /home is the first part of its name. Similarly, we know that /home is stored inside the root directory / because its name begins with /.

Now let’s learn the command that will let us see the contents of our own filesystem. We can see what’s in our home directory by running ls, which stands for listing:

$ hostname
ip15
$ ls
projects  scratch c3g_ws

(you might have an different hostname than ip15 since Mammouth has more than one log-in node.)

Create a directory

Let’s create a new directory called RNAseq_workshop using the command mkdir RNAseq_workshop (which has no output):

$ mkdir RNAseq_workshop

As you might have guessed from its name, mkdir means “make directory”. Since RNAseq_workshop is a relative path (i.e., does not have a leading slash, like /what/ever/RNAseq_workshop), the new directory is created in the current working directory:

$ ls -l
drwxr-xr-x 2 guest11 guests 4096 Jan 11 11:04 projects
drwxr-xr-x 2 guest11 guests 4096 Jan 11 11:04 c3g_ws
drwxr-xr-x 2 guest11 guests 4096 Jan 18 15:29 RNAseq_workshop
lrwxrwxrwx 1 guest11 guests   16 Jan 11 11:04 scratch -> /scratch/guest11

We have added a flag, -l to list the folder content in a long list format; extra information is also included and the formating is one file per line.

  1. Don’t use whitespaces.

Whitespaces can make a name more meaningful but since whitespace is used to break arguments on the command line it is better to avoid them in names of files and directories. You can use - or _ instead of whitespace.

  1. Don’t begin the name with - (dash).

Commands treat names starting with - as options.

Since we’ve just created the RNAseq_workshop directory, there’s nothing in it yet:

$ ls -l RNAseq_workshop

Create a text file

Let’s change our working directory to RNAseq_workshop using cd, then run a text editor called nano to create a file called draft.txt:

There are several ways to create a file. One way to do it is to use the nano editor. nano is simple to use and is available everywhere on Compute Canada.

$ cd RNAseq_workshop
$ pwd
/home/guest33/RNAseq_workshop
$ nano draft.txt

Note: nano main commands are displayed at the bottom of the screen. The ^ symbol corresponds to the control key on MAC keyboards and Ctrl on PC keyboards. ^+O will save the file, ^+X will exit nano and ^+G will display the help menu. Use ^+C if you are stuck or lost. It will cancel or quit anything you are doing.

You can do most common things in nano that you can do in another editor, such as typing or copying text.

The terminal has commands to print and/or navigate the contents of files without the use of an editor, like the cat and less commands.

what is the result of the following commands?

cat draft.txt

and

less draft.txt

If you are stuck in the less program, just type q.

Q. What happens if you type cd on its own, without giving a directory?

Solution (click-here) You can see where you are taken by typing pwd

$ cd
$ pwd
/home/poq

Getting help

ls and the other commands we are learning have lots of flags. There are two common ways to find out how to use a command and what flags it accepts:

  1. We can pass a --help flag to the command, such as
$ ls --help
$ cat --help
...
  1. We can read its manual with man, such as
$ man ls

Depending on your environment you might find that only one of these works (either man or --help).

If your are stuck in the man page, just type q.

What else is stored on Mammouth

$ ls -F c3g_ws
def-poq/

Now that we know the def-poq directory is located on our c3g_ws/ directory, we can do two things.

Are there people with their own account that do not see def-poq in their c3g_ws folder? Run the following command please:

mkdir c3g_ws  
ln -s /project/6019104 c3g_ws/def-poq

Now, we can look at its contents, using the same strategy as before, passing a directory name to ls:

$ ls -F c3g_ws/def-poq
poq/  workshop/

Second, we can actually change our location to a different directory, so we are no longer located in our home directory.

Let’s say we want to move to the workshop directory we saw above. We can use the following series of commands to get there:

$ cd c3g_ws
$ cd def-poq
$ cd workshop
$ cd pub

These commands will move us from our home directory into def-poq, then into the c3g_ws directory, then into the def-poq directory, etc. cd doesn’t print anything, but if we run pwd after it, we can see that we are now in /home/<yourself>/c3g_ws/def-poq/workshop/pub. If we run ls without arguments now, it lists the contents of /home/<yourself>/c3g_ws/def-poq/workshop/pub, because that’s where we now are:

$ pwd
/home/poq/c3g_ws/def-poq/workshop/pub
$ ls -F
C3GAW_08_2018/  cedar/

We now know how to go down the directory tree, but how do we go up, back where we are from? We might try the following:

$ cd workshop
-bash: cd: workshop: No such file or directory

But we get an error! Why is that?

With our methods so far, cd can only see sub-directories inside your current directory. There are different ways to see directories above your current location; we’ll start with the simplest.

There is a shortcut in the shell to move up one directory level that looks like this:

$ cd ..

.. is a special directory name meaning “the directory containing this one”, or more succinctly, the parent of the current directory. Sure enough, if we run pwd after running cd .., we’re back in /home/poq/c3g_ws/def-poq/workshop:

$ pwd
/home/poq/c3g_ws/def-poq/workshop


So far, when specifying directory names, or even a directory path (as above), we have been using relative paths. When you use a relative path with a command like ls or cd, it tries to find that location from where we are, rather than from the root of the file system.

However, it is possible to specify the absolute path to a directory by including its entire path from the root directory, which is indicated by a leading slash. The leading / tells the computer to follow the path from the root of the file system, so it always refers to exactly one directory, no matter where we are when we run the command.

This allows us to move to our pub directory from anywhere on the filesystem (including from inside c3g_ws). To find the absolute path we’re looking for, we can use pwd and then extract the piece we need to move to c3g_ws.

$ pwd
/home/poq
$ cd /home/poq/c3g_ws/def-poq/workshop

Run pwd and ls -F to ensure that we’re in the directory we expect.

One More Shortcut

The shell interprets the character ~ (tilde) at the start of a path to mean “the current user’s home directory”. For example, if your home directory is /home/guest33, then ~/c3g_ws is equivalent to /home/guest33/c3g_ws. This only works if it is the first character in the path: here/there/~/elsewhere is not here/there/home/guest33/elsewhere.

Q. Starting from /home/guest33/c3g_ws/, which of the following commands could guest33 use to navigate to the home directory, which is /home/guest33?

  1. cd .
  2. cd /
  3. cd /Users/guest33
  4. cd ../..
  5. cd ~
  6. cd home
  7. cd ~/data/..
  8. cd
  9. cd ..

Solution (click-here) 1. No: . stands for the current directory.
2. No: / stands for the root directory.
3. No: guest33’s home directory is /home/guest33.
4. No: this goes up two levels, i.e. ends in /home.
5. Yes: ~ stands for the user’s home directory, in this case /home/guest33.
6. No: this would navigate into a directory home in the current directory if it exists.
7. Yes: unnecessarily complicated, but correct.
8. Yes: shortcut to go back to the user’s home directory.
9. Yes: goes up one level.

Removing files and directories

Lets go back to the directory and file we have created before.

$ cd
$ pwd
/home/guest33
$ ll
drwxr-xr-x 2 guest11 guests 4096 Jan 11 11:04 c3g_ws
drwxr-xr-x 2 guest11 guests 4096 Jan 11 11:04 projects
drwxr-xr-x 2 guest11 guests 4096 Jan 18 15:29 RNAseq_workshop
lrwxrwxrwx 1 guest11 guests   16 Jan 11 11:04 scratch -> /scratch/guest11
$ cd RNAseq_workshop
$ ls
draft.txt
$ rm draft.txt

This command removes files (rm is short for “remove”). If we run ls again, its output is empty once more, which tells us that our file is gone:

$ ls

Let’s re-create that file and then move up one directory to /home/guest11 using cd ..:

$ pwd
/home/guest11/RNAseq_workshop
$ nano draft.txt
$ ls
draft.txt  
$ cd ..
$ pwd
/home/guest11

If we try to remove the entire RNAseq_workshop directory using rm RNAseq_workshop, we get an error message:

$ rm RNAseq_workshop
rm: cannot remove 'RNAseq_workshop': Is a directory

This happens because rm by default only works on files, not directories.

To really get rid of RNAseq_workshop, we must also delete the file draft.txt. We can do this with the recursive option for rm:

$ rm -r RNAseq_workshop

File manipulations

Deleting, copying and removing files is fairly easy. cp and mv work similarly, except cp will create a copy of a file while mv will move the file and delete the original.
You need to specify the original file and the destination. The destination can be a directory (current or a different one) or another file name. Remember that the current directory is . and the parent directory ..

For instance mv draft.txt ../ will move a file to the parent directory. rm will remove a file. Be careful when you are using it as there is no way to recover it!

Let’s recreate the RNAseq_workshop Directory

mkdir RNA_workshop

I can recreate a draft.txt file in my home and play with it a bit.

$ nano drsft.txt

oups I wanted it to be called draft!

$ mv drsft.txt draft.txt

oups I wanted to have it in the RNAseq_workshop Directory

$ cp draft.txt RNAseq_workshop
$ ls
draft.txt
$ ls RNAseq_workshop
draft.txt
$ rm draft.txt


Exercise

Let’s copy the workshop material to the RNAseq_workshop folder. We have all the class material at the followig PATH ~/c3g_ws/def-poq/workshop/pub/C3GAW_08_2018/RNAseq_TestData

Q. copy the directory and its content (recursively) into the ~/RNAseq_workshop directory created in the previous exercise. To be able to see what is going on, activate the verbose flag of the copy command.

Solution (click-here) First, try to find the verbose flag with either man cp or cp --help

Then copy the directory with both the verbose and the recursive flag in your home

cp -r -v ~/c3g_ws/def-poq/workshop/pub/C3GAW_08_2018/RNAseq_TestData ~/RNAseq_workshop/.

Note that we used an absolute path to copy the directory so the working directory does not matter.

Environment variables

An environment variable can be thought of as an alias or shortcut for some content. It is created by assigning a content or value to a variable name.
For instance VAR=1 will create a variable called VAR containing the value 1. Adding $ sign at the beginning of the variable will return its content. You can print the content of a variable using the command echo

Create a variable called MY_VAR and set it to 2. Display its value. Change the content of MY_VAR to HELLO,WORLD and display its value again.

Solution (click here)

  MY_VAR=2
  echo $MY_VAR
  MY_VAR=HELLO,WORLD #or MY_VAR="HELLO,WORLD"
  echo $MY_VAR

Bash can be finicky about spaces, especially when setting a variable value.


Some special environment variables

  • $HOME –> your home directory
  • $PATH –> where all the scripts you can execute are stored (like ls, rm, etc.). If the script is not in the $PATH, you need to use its full or relative path to run it.
  • $HOSTNAME –> the current machine
  • $USER –> your username
  • Many others exist, you can type env to see all the variables set in your environment
  • One env var that you will see a lot is $MUGQIC_PIPELINES_HOME

Stream redirection

Sometimes, you want to redirect the output (stdout) of a process to a file (it is connected to your screen by default). This can be accomplished by adding > output.txt.
For instance, echo HELLO,WORLD will print HELLO,WORLD to your terminal (i.e. the default sdtout redirection), but echo HELLO,WORLD > hello_world.txt will print HELLO,WORLD into a new file called hello_world.txt. > is a tool to redirect a processes stdout to a file. Note that you can also use >> if you want to append data to a file instead of loosing its old content.

Print the content of the current directory into a file called log.txt. Display the content of the file.

Solution (click here)

ls > log.txt
cat log.txt



This tutorial is heavily influenced by the software carpentry Unix Shell introduction, found at http://swcarpentry.github.io/shell-novice/ which regulated by a Creative Commons license.