Skip to content

Commit 3ce111f

Browse files
committed
source commit: 3eb09bd
0 parents  commit 3ce111f

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

68 files changed

+6223
-0
lines changed

01-introduction.md

Lines changed: 388 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,388 @@
1+
---
2+
title: Introducing the Shell
3+
teaching: 20
4+
exercises: 10
5+
---
6+
7+
::::::::::::::::::::::::::::::::::::::: objectives
8+
9+
- Describe key reasons for learning shell.
10+
- Navigate your file system using the command line.
11+
- Access and read help files for `bash` programs and use help files to identify useful command options.
12+
- Demonstrate the use of tab completion, and explain its advantages.
13+
14+
::::::::::::::::::::::::::::::::::::::::::::::::::
15+
16+
:::::::::::::::::::::::::::::::::::::::: questions
17+
18+
- What is a command shell and why would I use one?
19+
- How can I move around on my computer?
20+
- How can I see what files and directories I have?
21+
- How can I specify the location of a file or directory on my computer?
22+
23+
::::::::::::::::::::::::::::::::::::::::::::::::::
24+
25+
## What is a shell and why should I care?
26+
27+
A *shell* is a computer program that presents a command line interface
28+
which allows you to control your computer using commands entered
29+
with a keyboard instead of controlling graphical user interfaces
30+
(GUIs) with a mouse/keyboard/touchscreen combination.
31+
32+
There are many reasons to learn about the shell:
33+
34+
- Many bioinformatics tools can only be used through a command line interface. Many more
35+
have features and parameter options which are not available in the GUI.
36+
BLAST is an example. Many of the advanced functions are only accessible
37+
to users who know how to use a shell.
38+
- The shell makes your work less boring. In bioinformatics you often need to repeat tasks with a large number of files. With the shell, you can automate those repetitive tasks and leave you free to do more exciting things.
39+
- The shell makes your work less error-prone. When humans do the same thing a hundred different times
40+
(or even ten times), they're likely to make a mistake. Your computer can do the same thing a thousand times
41+
with no mistakes.
42+
- The shell makes your work more reproducible. When you carry out your work in the command-line
43+
(rather than a GUI), your computer keeps a record of every step that you've carried out, which you can use
44+
to re-do your work when you need to. It also gives you a way to communicate unambiguously what you've done,
45+
so that others can inspect or apply your process to new data.
46+
- Many bioinformatic tasks require large amounts of computing power and can't realistically be run on your
47+
own machine. These tasks are best performed using remote computers or cloud computing, which can only be accessed
48+
through a shell.
49+
50+
In this lesson you will learn how to use the command line interface to move around in your file system.
51+
52+
## How to access the shell
53+
54+
On a Mac or Linux machine, you can access a shell through a program called "Terminal", which is already available
55+
on your computer. The Terminal is a window into which we will type commands. If you're using Windows,
56+
you'll need to download a separate program to access the shell.
57+
58+
To save time, we are going to be working on a remote server where all the necessary data and software available.
59+
When we say a 'remote server', we are talking about a computer that is not the one you are working on right now.
60+
You will access the Carpentries remote server where everything is prepared for the lesson.
61+
We will learn the basics of the shell by manipulating some data files. Some of these files are very large
62+
, and would take time to download to your computer.
63+
We will also be using several bioinformatic packages in later lessons and installing all of the software
64+
would take up time even more time. A 'ready-to-go' server lets us focus on learning.
65+
66+
## How to access the remote server
67+
68+
You can log-in to the remote server using the [instructions from the Introduction to Cloud Computing for Genomics lesson](https://datacarpentry.org/cloud-genomics/02-logging-onto-cloud#logging-onto-a-cloud-instance).
69+
Your instructor will supply to you the `ip_address` and password that you need to login.
70+
71+
Each of you will have a different `ip_address`. This will
72+
prevent us from accidentally changing each other's files as we work through the
73+
exercises. The password will be the same for everyone.
74+
75+
After logging in, you will see a screen showing something like this:
76+
77+
```output
78+
Welcome to Ubuntu 20.04.5 LTS (GNU/Linux 5.4.0-137-generic x86_64)
79+
80+
* Documentation: https://help.ubuntu.com
81+
* Management: https://landscape.canonical.com
82+
* Support: https://ubuntu.com/advantage
83+
84+
System information as of Mon 13 Mar 2023 03:57:46 AM UTC
85+
86+
System load: 0.0 Processes: 192
87+
Usage of /: 20.3% of 98.27GB Users logged in: 0
88+
Memory usage: 25% IPv4 address for eth0: 172.31.12.214
89+
Swap usage: 0%
90+
91+
Get cloud support with Ubuntu Advantage Cloud Guest:
92+
http://www.ubuntu.com/business/services/cloud
93+
94+
178 updates can be applied immediately.
95+
108 of these updates are standard security updates.
96+
To see these additional updates run: apt list --upgradable
97+
98+
99+
Last login: Fri Mar 10 03:14:44 2023 from 72.83.168.14
100+
```
101+
102+
This provides a lot of information about the remote server that you're logging into. We're not going to use most of this information for
103+
our workshop, so you can clear your screen using the `clear` command.
104+
105+
Type the word `clear` into the terminal and press the `Enter` key.
106+
107+
```bash
108+
$ clear
109+
```
110+
111+
This will scroll your screen down to give you a fresh screen and will make it easier to read.
112+
You haven't lost any of the information on your screen. If you scroll up, you can see everything that has been output to your screen
113+
up until this point.
114+
115+
::::::::::::::::::::::::::::::::::::::::: callout
116+
117+
## Tip
118+
119+
Hot-key combinations are shortcuts for performing common commands.
120+
The hot-key combination for clearing the console is `Ctrl+L`. Feel free to try it and see for yourself.
121+
122+
::::::::::::::::::::::::::::::::::::::::::::::::::
123+
124+
## Navigating your file system
125+
126+
The part of the operating system that manages files and directories
127+
is called the **file system**.
128+
It organizes our data into files,
129+
which hold information,
130+
and directories (also called "folders"),
131+
which hold files or other directories.
132+
133+
Several commands are frequently used to create, inspect, rename, and delete files and directories.
134+
135+
::::::::::::::::::::::::::::::::::::::::: callout
136+
137+
## Preparation Magic
138+
139+
You may have a prompt (the characters to the left of the cursor) that looks different from the `$` sign character used here.
140+
If you would like to change your prompt to match the example prompt, first type the command:
141+
`echo $PS1`
142+
into your shell, followed by pressing the <kbd>Enter</kbd> key.
143+
144+
This will print the bash special characters that are currently defining your prompt.
145+
To change the prompt to a `$` (followed by a space), enter the command:
146+
`PS1='$ '`
147+
Your window should look like our example in this lesson.
148+
149+
To change back to your original prompt, type in the output of the previous command `echo $PS1` (this will be different depending on the
150+
original configuration) between the quotes in the following command:
151+
`PS1=""`
152+
153+
For example, if the output of `echo $PS1` was `\u@\h:\w $ `,
154+
then type those characters between the quotes in the above command: `PS1="\u@\h:\w $ "`.
155+
Alternatively, you can reset your original prompt by exiting the shell and opening a new session.
156+
157+
This isn't necessary to follow along (in fact, your prompt may have other helpful information you want to know about). This is up to you!
158+
159+
::::::::::::::::::::::::::::::::::::::::::::::::::
160+
161+
```bash
162+
$
163+
```
164+
165+
The dollar sign is a **prompt**, which shows us that the shell is waiting for input;
166+
your shell may use a different character as a prompt and may add information before
167+
the prompt. When typing commands, either from these lessons or from other sources,
168+
do not type the prompt, only the commands that follow it.
169+
170+
Let's find out where we are by running a command called `pwd`
171+
(which stands for "print working directory").
172+
At any moment, our **current working directory**
173+
is our current default directory,
174+
i.e.,
175+
the directory that the computer assumes we want to run commands in,
176+
unless we explicitly specify something else.
177+
Here,
178+
the computer's response is `/home/dcuser`,
179+
which is the top level directory within our cloud system:
180+
181+
```bash
182+
$ pwd
183+
```
184+
185+
```output
186+
/home/dcuser
187+
```
188+
189+
Let's look at how our file system is organized. We can see what files and subdirectories are in this directory by running `ls`,
190+
which stands for "listing":
191+
192+
```bash
193+
$ ls
194+
```
195+
196+
```output
197+
R r_data shell_data
198+
```
199+
200+
`ls` prints the names of the files and directories in the current directory in
201+
alphabetical order,
202+
arranged neatly into columns.
203+
We'll be working within the `shell_data` subdirectory, and creating new subdirectories, throughout this workshop.
204+
205+
The command to change locations in our file system is `cd`, followed by a
206+
directory name to change our working directory.
207+
`cd` stands for "change directory".
208+
209+
Let's say we want to navigate to the `shell_data` directory we saw above. We can
210+
use the following command to get there:
211+
212+
```bash
213+
$ cd shell_data
214+
```
215+
216+
Let's look at what is in this directory:
217+
218+
```bash
219+
$ ls
220+
```
221+
222+
```output
223+
sra_metadata untrimmed_fastq
224+
```
225+
226+
We can make the `ls` output more comprehensible by using the **flag** `-F`,
227+
which tells `ls` to add a trailing `/` to the names of directories:
228+
229+
```bash
230+
$ ls -F
231+
```
232+
233+
```output
234+
sra_metadata/ untrimmed_fastq/
235+
```
236+
237+
Anything with a "/" after it is a directory. Things with a "\*" after them are programs. If
238+
there are no decorations, it's a file.
239+
240+
`ls` has lots of other options. To find out what they are, we can type:
241+
242+
```bash
243+
$ man ls
244+
```
245+
246+
`man` (short for manual) displays detailed documentation (also referred as man page or man file)
247+
for `bash` commands. It is a powerful resource to explore `bash` commands, understand
248+
their usage and flags. Some manual files are very long. You can scroll through the
249+
file using your keyboard's down arrow or use the <kbd>Space</kbd> key to go forward one page
250+
and the <kbd>b</kbd> key to go backwards one page. When you are done reading, hit <kbd>q</kbd>
251+
to quit.
252+
253+
::::::::::::::::::::::::::::::::::::::: challenge
254+
255+
## Challenge
256+
257+
Use the `-l` option for the `ls` command to display more information for each item
258+
in the directory. What is one piece of additional information this long format
259+
gives you that you don't see with the bare `ls` command?
260+
261+
::::::::::::::: solution
262+
263+
## Solution
264+
265+
```bash
266+
$ ls -l
267+
```
268+
269+
```output
270+
total 8
271+
drwxr-x--- 2 dcuser dcuser 4096 Jul 30 2015 sra_metadata
272+
drwxr-xr-x 2 dcuser dcuser 4096 Nov 15 2017 untrimmed_fastq
273+
```
274+
275+
The additional information given includes the name of the owner of the file,
276+
when the file was last modified, and whether the current user has permission
277+
to read and write to the file.
278+
279+
:::::::::::::::::::::::::
280+
281+
::::::::::::::::::::::::::::::::::::::::::::::::::
282+
283+
No one can possibly learn all of these arguments, that's what the manual page
284+
is for. You can (and should) refer to the manual page or other help files
285+
as needed.
286+
287+
Let's go into the `untrimmed_fastq` directory and see what is in there.
288+
289+
```bash
290+
$ cd untrimmed_fastq
291+
$ ls -F
292+
```
293+
294+
```output
295+
SRR097977.fastq SRR098026.fastq
296+
```
297+
298+
This directory contains two files with `.fastq` extensions. FASTQ is a format
299+
for storing information about sequencing reads and their quality.
300+
We will be learning more about FASTQ files in a later lesson.
301+
302+
### Shortcut: Tab Completion
303+
304+
Typing out file or directory names can waste a
305+
lot of time and it's easy to make typing mistakes. Instead we can use tab complete
306+
as a shortcut. When you start typing out the name of a directory or file, then
307+
hit the <kbd>Tab</kbd> key, the shell will try to fill in the rest of the
308+
directory or file name.
309+
310+
Return to your home directory:
311+
312+
```bash
313+
$ cd
314+
```
315+
316+
then enter:
317+
318+
```bash
319+
$ cd she<tab>
320+
```
321+
322+
The shell will fill in the rest of the directory name for
323+
`shell_data`.
324+
325+
Now change directories to `untrimmed_fastq` in `shell_data`
326+
327+
```bash
328+
$ cd shell_data
329+
$ cd untrimmed_fastq
330+
```
331+
332+
Using tab complete can be very helpful. However, it will only autocomplete
333+
a file or directory name if you've typed enough characters to provide
334+
a unique identifier for the file or directory you are trying to access.
335+
336+
For example, if we now try to list the files which names start with `SR`
337+
by using tab complete:
338+
339+
```bash
340+
$ ls SR<tab>
341+
```
342+
343+
The shell auto-completes your command to `SRR09`, because all file names in
344+
the directory begin with this prefix. When you hit
345+
<kbd>Tab</kbd> again, the shell will list the possible choices.
346+
347+
```bash
348+
$ ls SRR09<tab><tab>
349+
```
350+
351+
```output
352+
SRR097977.fastq SRR098026.fastq
353+
```
354+
355+
Tab completion can also fill in the names of programs, which can be useful if you
356+
remember the beginning of a program name.
357+
358+
```bash
359+
$ pw<tab><tab>
360+
```
361+
362+
```output
363+
pwck pwconv pwd pwdx pwunconv
364+
```
365+
366+
Displays the name of every program that starts with `pw`.
367+
368+
## Summary
369+
370+
We now know how to move around our file system using the command line.
371+
This gives us an advantage over interacting with the file system through
372+
a GUI as it allows us to work on a remote server, carry out the same set of operations
373+
on a large number of files quickly, and opens up many opportunities for using
374+
bioinformatic software that is only available in command line versions.
375+
376+
In the next few episodes, we'll be expanding on these skills and seeing how
377+
using the command line shell enables us to make our workflow more efficient and reproducible.
378+
379+
:::::::::::::::::::::::::::::::::::::::: keypoints
380+
381+
- The shell gives you the ability to work more efficiently by using keyboard commands rather than a GUI.
382+
- Useful commands for navigating your file system include: `ls`, `pwd`, and `cd`.
383+
- Most commands take options (flags) which begin with a `-`.
384+
- Tab completion can reduce errors from mistyping and make work more efficient in the shell.
385+
386+
::::::::::::::::::::::::::::::::::::::::::::::::::
387+
388+

0 commit comments

Comments
 (0)