Skip to content

Commit 26af628

Browse files
committed
source commit: 88b571a
0 parents  commit 26af628

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

67 files changed

+6003
-0
lines changed

01-introduction.md

Lines changed: 389 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,389 @@
1+
---
2+
title: Introducing the Shell
3+
teaching: 20
4+
exercises: 10
5+
---
6+
7+
::::::::::::::::::::::::::::::::::::::: objectives
8+
9+
- Describe key reasons for learning shell.
10+
- Navigate your file system using the command line.
11+
- Access and read help files for `bash` programs and use help files to identify useful command options.
12+
- Demonstrate the use of tab completion, and explain its advantages.
13+
14+
::::::::::::::::::::::::::::::::::::::::::::::::::
15+
16+
:::::::::::::::::::::::::::::::::::::::: questions
17+
18+
- What is a command shell and why would I use one?
19+
- How can I move around on my computer?
20+
- How can I see what files and directories I have?
21+
- How can I specify the location of a file or directory on my computer?
22+
23+
::::::::::::::::::::::::::::::::::::::::::::::::::
24+
25+
## What is a shell and why should I care?
26+
27+
A *shell* is a computer program that presents a command line interface
28+
which allows you to control your computer using commands entered
29+
with a keyboard instead of controlling graphical user interfaces
30+
(GUIs) with a mouse/keyboard/touchscreen combination.
31+
32+
There are many reasons to learn about the shell:
33+
34+
- Many bioinformatics tools can only be used through a command line interface. Many more
35+
have features and parameter options which are not available in the GUI.
36+
BLAST is an example. Many of the advanced functions are only accessible
37+
to users who know how to use a shell.
38+
- The shell makes your work less boring. In bioinformatics you often need to repeat tasks with a large number of files. With the shell, you can automate those repetitive tasks and leave you free to do more exciting things.
39+
- The shell makes your work less error-prone. When humans do the same thing a hundred different times
40+
(or even ten times), they're likely to make a mistake. Your computer can do the same thing a thousand times
41+
with no mistakes.
42+
- The shell makes your work more reproducible. When you carry out your work in the command-line
43+
(rather than a GUI), your computer keeps a record of every step that you've carried out, which you can use
44+
to re-do your work when you need to. It also gives you a way to communicate unambiguously what you've done,
45+
so that others can inspect or apply your process to new data.
46+
- Many bioinformatic tasks require large amounts of computing power and can't realistically be run on your
47+
own machine. These tasks are best performed using remote computers or cloud computing, which can only be accessed
48+
through a shell.
49+
50+
In this lesson you will learn how to use the command line interface to move around in your file system.
51+
52+
## How to access the shell
53+
54+
On a Mac or Linux machine, you can access a shell through a program called "Terminal", which is already available
55+
on your computer. The Terminal is a window into which we will type commands. If you're using Windows,
56+
you'll need to download a separate program to access the shell.
57+
58+
To save time, we are going to be working on a remote server where all the necessary data and software available.
59+
When we say a 'remote server', we are talking about a computer that is not the one you are working on right now.
60+
You will access the Carpentries remote server where everything is prepared for the lesson.
61+
We will learn the basics of the shell by manipulating some data files. Some of these files are very large
62+
, and would take time to download to your computer.
63+
We will also be using several bioinformatic packages in later lessons and installing all of the software
64+
would take up time even more time. A 'ready-to-go' server lets us focus on learning.
65+
66+
## How to access the remote server
67+
68+
You can log-in to the remote server using the instructions
69+
[here](https://datacarpentry.org/cloud-genomics/02-logging-onto-cloud#logging-onto-a-cloud-instance).
70+
Your instructor will supply to you the `ip_address` and password that you need to login.
71+
72+
Each of you will have a different `ip_address`. This will
73+
prevent us from accidentally changing each other's files as we work through the
74+
exercises. The password will be the same for everyone.
75+
76+
After logging in, you will see a screen showing something like this:
77+
78+
```output
79+
Welcome to Ubuntu 20.04.5 LTS (GNU/Linux 5.4.0-137-generic x86_64)
80+
81+
* Documentation: https://help.ubuntu.com
82+
* Management: https://landscape.canonical.com
83+
* Support: https://ubuntu.com/advantage
84+
85+
System information as of Mon 13 Mar 2023 03:57:46 AM UTC
86+
87+
System load: 0.0 Processes: 192
88+
Usage of /: 20.3% of 98.27GB Users logged in: 0
89+
Memory usage: 25% IPv4 address for eth0: 172.31.12.214
90+
Swap usage: 0%
91+
92+
Get cloud support with Ubuntu Advantage Cloud Guest:
93+
http://www.ubuntu.com/business/services/cloud
94+
95+
178 updates can be applied immediately.
96+
108 of these updates are standard security updates.
97+
To see these additional updates run: apt list --upgradable
98+
99+
100+
Last login: Fri Mar 10 03:14:44 2023 from 72.83.168.14
101+
```
102+
103+
This provides a lot of information about the remote server that you're logging into. We're not going to use most of this information for
104+
our workshop, so you can clear your screen using the `clear` command.
105+
106+
Type the word `clear` into the terminal and press the `Enter` key.
107+
108+
```bash
109+
$ clear
110+
```
111+
112+
This will scroll your screen down to give you a fresh screen and will make it easier to read.
113+
You haven't lost any of the information on your screen. If you scroll up, you can see everything that has been output to your screen
114+
up until this point.
115+
116+
::::::::::::::::::::::::::::::::::::::::: callout
117+
118+
## Tip
119+
120+
Hot-key combinations are shortcuts for performing common commands.
121+
The hot-key combination for clearing the console is `Ctrl+L`. Feel free to try it and see for yourself.
122+
123+
::::::::::::::::::::::::::::::::::::::::::::::::::
124+
125+
## Navigating your file system
126+
127+
The part of the operating system that manages files and directories
128+
is called the **file system**.
129+
It organizes our data into files,
130+
which hold information,
131+
and directories (also called "folders"),
132+
which hold files or other directories.
133+
134+
Several commands are frequently used to create, inspect, rename, and delete files and directories.
135+
136+
::::::::::::::::::::::::::::::::::::::::: callout
137+
138+
## Preparation Magic
139+
140+
You may have a prompt (the characters to the left of the cursor) that looks different from the `$` sign character used here.
141+
If you would like to change your prompt to match the example prompt, first type the command:
142+
`echo $PS1`
143+
into your shell, followed by pressing the <kbd>Enter</kbd> key.
144+
145+
This will print the bash special characters that are currently defining your prompt.
146+
To change the prompt to a `$` (followed by a space), enter the command:
147+
`PS1='$ '`
148+
Your window should look like our example in this lesson.
149+
150+
To change back to your original prompt, type in the output of the previous command `echo $PS1` (this will be different depending on the
151+
original configuration) between the quotes in the following command:
152+
`PS1=""`
153+
154+
For example, if the output of `echo $PS1` was `\u@\h:\w $ `,
155+
then type those characters between the quotes in the above command: `PS1="\u@\h:\w $ "`.
156+
Alternatively, you can reset your original prompt by exiting the shell and opening a new session.
157+
158+
This isn't necessary to follow along (in fact, your prompt may have other helpful information you want to know about). This is up to you!
159+
160+
::::::::::::::::::::::::::::::::::::::::::::::::::
161+
162+
```bash
163+
$
164+
```
165+
166+
The dollar sign is a **prompt**, which shows us that the shell is waiting for input;
167+
your shell may use a different character as a prompt and may add information before
168+
the prompt. When typing commands, either from these lessons or from other sources,
169+
do not type the prompt, only the commands that follow it.
170+
171+
Let's find out where we are by running a command called `pwd`
172+
(which stands for "print working directory").
173+
At any moment, our **current working directory**
174+
is our current default directory,
175+
i.e.,
176+
the directory that the computer assumes we want to run commands in,
177+
unless we explicitly specify something else.
178+
Here,
179+
the computer's response is `/home/dcuser`,
180+
which is the top level directory within our cloud system:
181+
182+
```bash
183+
$ pwd
184+
```
185+
186+
```output
187+
/home/dcuser
188+
```
189+
190+
Let's look at how our file system is organized. We can see what files and subdirectories are in this directory by running `ls`,
191+
which stands for "listing":
192+
193+
```bash
194+
$ ls
195+
```
196+
197+
```output
198+
R r_data shell_data
199+
```
200+
201+
`ls` prints the names of the files and directories in the current directory in
202+
alphabetical order,
203+
arranged neatly into columns.
204+
We'll be working within the `shell_data` subdirectory, and creating new subdirectories, throughout this workshop.
205+
206+
The command to change locations in our file system is `cd`, followed by a
207+
directory name to change our working directory.
208+
`cd` stands for "change directory".
209+
210+
Let's say we want to navigate to the `shell_data` directory we saw above. We can
211+
use the following command to get there:
212+
213+
```bash
214+
$ cd shell_data
215+
```
216+
217+
Let's look at what is in this directory:
218+
219+
```bash
220+
$ ls
221+
```
222+
223+
```output
224+
sra_metadata untrimmed_fastq
225+
```
226+
227+
We can make the `ls` output more comprehensible by using the **flag** `-F`,
228+
which tells `ls` to add a trailing `/` to the names of directories:
229+
230+
```bash
231+
$ ls -F
232+
```
233+
234+
```output
235+
sra_metadata/ untrimmed_fastq/
236+
```
237+
238+
Anything with a "/" after it is a directory. Things with a "\*" after them are programs. If
239+
there are no decorations, it's a file.
240+
241+
`ls` has lots of other options. To find out what they are, we can type:
242+
243+
```bash
244+
$ man ls
245+
```
246+
247+
`man` (short for manual) displays detailed documentation (also referred as man page or man file)
248+
for `bash` commands. It is a powerful resource to explore `bash` commands, understand
249+
their usage and flags. Some manual files are very long. You can scroll through the
250+
file using your keyboard's down arrow or use the <kbd>Space</kbd> key to go forward one page
251+
and the <kbd>b</kbd> key to go backwards one page. When you are done reading, hit <kbd>q</kbd>
252+
to quit.
253+
254+
::::::::::::::::::::::::::::::::::::::: challenge
255+
256+
## Challenge
257+
258+
Use the `-l` option for the `ls` command to display more information for each item
259+
in the directory. What is one piece of additional information this long format
260+
gives you that you don't see with the bare `ls` command?
261+
262+
::::::::::::::: solution
263+
264+
## Solution
265+
266+
```bash
267+
$ ls -l
268+
```
269+
270+
```output
271+
total 8
272+
drwxr-x--- 2 dcuser dcuser 4096 Jul 30 2015 sra_metadata
273+
drwxr-xr-x 2 dcuser dcuser 4096 Nov 15 2017 untrimmed_fastq
274+
```
275+
276+
The additional information given includes the name of the owner of the file,
277+
when the file was last modified, and whether the current user has permission
278+
to read and write to the file.
279+
280+
:::::::::::::::::::::::::
281+
282+
::::::::::::::::::::::::::::::::::::::::::::::::::
283+
284+
No one can possibly learn all of these arguments, that's what the manual page
285+
is for. You can (and should) refer to the manual page or other help files
286+
as needed.
287+
288+
Let's go into the `untrimmed_fastq` directory and see what is in there.
289+
290+
```bash
291+
$ cd untrimmed_fastq
292+
$ ls -F
293+
```
294+
295+
```output
296+
SRR097977.fastq SRR098026.fastq
297+
```
298+
299+
This directory contains two files with `.fastq` extensions. FASTQ is a format
300+
for storing information about sequencing reads and their quality.
301+
We will be learning more about FASTQ files in a later lesson.
302+
303+
### Shortcut: Tab Completion
304+
305+
Typing out file or directory names can waste a
306+
lot of time and it's easy to make typing mistakes. Instead we can use tab complete
307+
as a shortcut. When you start typing out the name of a directory or file, then
308+
hit the <kbd>Tab</kbd> key, the shell will try to fill in the rest of the
309+
directory or file name.
310+
311+
Return to your home directory:
312+
313+
```bash
314+
$ cd
315+
```
316+
317+
then enter:
318+
319+
```bash
320+
$ cd she<tab>
321+
```
322+
323+
The shell will fill in the rest of the directory name for
324+
`shell_data`.
325+
326+
Now change directories to `untrimmed_fastq` in `shell_data`
327+
328+
```bash
329+
$ cd shell_data
330+
$ cd untrimmed_fastq
331+
```
332+
333+
Using tab complete can be very helpful. However, it will only autocomplete
334+
a file or directory name if you've typed enough characters to provide
335+
a unique identifier for the file or directory you are trying to access.
336+
337+
For example, if we now try to list the files which names start with `SR`
338+
by using tab complete:
339+
340+
```bash
341+
$ ls SR<tab>
342+
```
343+
344+
The shell auto-completes your command to `SRR09`, because all file names in
345+
the directory begin with this prefix. When you hit
346+
<kbd>Tab</kbd> again, the shell will list the possible choices.
347+
348+
```bash
349+
$ ls SRR09<tab><tab>
350+
```
351+
352+
```output
353+
SRR097977.fastq SRR098026.fastq
354+
```
355+
356+
Tab completion can also fill in the names of programs, which can be useful if you
357+
remember the beginning of a program name.
358+
359+
```bash
360+
$ pw<tab><tab>
361+
```
362+
363+
```output
364+
pwck pwconv pwd pwdx pwunconv
365+
```
366+
367+
Displays the name of every program that starts with `pw`.
368+
369+
## Summary
370+
371+
We now know how to move around our file system using the command line.
372+
This gives us an advantage over interacting with the file system through
373+
a GUI as it allows us to work on a remote server, carry out the same set of operations
374+
on a large number of files quickly, and opens up many opportunities for using
375+
bioinformatic software that is only available in command line versions.
376+
377+
In the next few episodes, we'll be expanding on these skills and seeing how
378+
using the command line shell enables us to make our workflow more efficient and reproducible.
379+
380+
:::::::::::::::::::::::::::::::::::::::: keypoints
381+
382+
- The shell gives you the ability to work more efficiently by using keyboard commands rather than a GUI.
383+
- Useful commands for navigating your file system include: `ls`, `pwd`, and `cd`.
384+
- Most commands take options (flags) which begin with a `-`.
385+
- Tab completion can reduce errors from mistyping and make work more efficient in the shell.
386+
387+
::::::::::::::::::::::::::::::::::::::::::::::::::
388+
389+

0 commit comments

Comments
 (0)