Chapter 18 - Shell Script Essentials
First we're going to look at how to write shell scripts as well as the different ways to execute them. We'll look at how shell script files should be structured and how to use ‘shebangs’ to define how a shell script will run.
These will be essential techniques to have as a foundation for building your own scripts. Even if you are familiar with shell scripts I would suggest skimming this chapter to make sure you understand each of the concepts, particularly the later section where we talk about using the
env command in shebangs.
What is a Shell Script?
A shell script is just a text file which contains a set of commands. As soon as you find yourself repeating the same sequence of commands in a shell, it might be worth saving these commands to a file and running the file instead.
Saving your commands to a file has a number of benefits. It saves time - you don't need to type the commands out each time you want to run them! You can use your favourite editor to build the script file, and you can add ‘comments’ to describe what you are trying to achieve (which will make it far easier to update the script over time). Files can also easily be shared - meaning you can copy these scripts to other machines or share them with others who might find them useful.
Creating a Basic Shell Script
Let's create a simple shell script that shows us our most commonly used shell commands.
Almost every command that is needed to build the script has been discussed in the book already, so it shouldn't be too unfamiliar. But I'll still break it down blow-by-blow to help us understand it.
As we go through this section of the book we're going to extend this script and make it more useful!
The ‘common’ Command
We're going to create a new command, called ‘common’, that shows the most commonly used shell commands.
We should be able to do this using techniques we've seen so far. We'll do it like this:
- Read a large number of commands from the history
- Sort the commands, then count the number of duplicates
- Sort the commands, by the number of duplicates (i.e. ordering by ‘most commonly used’)
- Print the results to the screen.
Let's get started!
Creating a Simple Script
It's going to take some trial and error to get our commands right. So let's start by creating a shell script, which we'll run again and again.
In your favourite editor, create a file called
common.v1.sh and put it somewhere on your system. As an example, I'm going to create a folder called
scripts in my home directory and create the
common.v1.sh file there:
# Create a directory called 'scripts'. # Using the '-p' flag means we won't get an error if the folder exists. mkdir -p ~/scripts # Create the script file. touch ~/scripts/common.v1.sh # Open the script file in my favourite editor. vi ~/scripts.sh
I have called the script
common.v1.sh rather than
common.sh because in each chapter of this section we are going to improve upon the script and change the version number. So in later chapters we will create
common.v3.sh and so on.
These commands should be familiar. The
mkdir command creates a directory. The
-p (create parent directories if needed) flag stops the command from returning an error if the directory already exist.
touch command creates an empty file with the given name. Finally, I open the file in an editor. I am using Vim, but you can open this file in any editor you like.
Before we build the script, let's quickly talk about comments.
The shell ignores any text which follows a
# hash symbol. Whether this is text you type into a shell, or text in a shell script, the shell will ignore the content.
This is extremely useful - it means we can use the hash symbol to add comments to our scripts and commands. These comments are not interpreted by the shell, they are added just to make it easier for us to describe what is going on. If you come from a programming background you will likely be familiar with comments.
Here are a few examples:
# This is a comment - we can use this to describe what we're trying to do. echo "Hello Shell" # Comments can go at the end of a line... # You can also use a comment symbol to 'comment out' a line: # echo "Goodbye Shell"
From this point on we'll use comments a lot to explain what we are trying to accomplish with each section of a script. It is generally good practice to use comments to describe your intent - why you are doing something. This is far more useful for the reader than what you are doing. The ‘what’ should be clear from the commands - the ‘why’ is the thing readers will likely want to understand.
Here's an example of a bad comment:
# Write the CSV file, reverse it, cut it, reverse it. cat ~/effective-shell/data/top100.csv | rev | cut -d',' -f1 | rev
The comment just describes what the script is doing. But it doesn't explain why. A better comment would be:
# We want to extract the last field (the number of reviews) for each film. # Because we don't know how many fields there are we can reverse the text before # we cut it - then the last field becomes the first, which we extract and then # put back into the correct order by reversing it again. cat ~/effective-shell/data/top100.csv | rev | cut -d',' -f1 | rev
If you don't come from a programming background you might think that many of these comments are a little obvious. But as you write more and more code you'll realise that something that seemed obvious when you wrote it a while ago can look surprisingly baffling even just a few days later!
Now that we've discussed comments, we'll build our
common.v1.sh shell script.
Building and Testing the Script
Add the following commands to the
# Write the title of our command. echo "common commands:" # Show the most commonly used commands. tail ~/.bash_history -n 1000 | sort | uniq -c | sed 's/^ *//' | sort -n -r | head -n 10
This is a short script, but there is quite a lot going on. Let's look at it blow-by-blow:
- First we take the last 1000 lines of the
~/.bash_historyfile using the
- Then we sort the commands. This will put all of the duplicates next to each other
- Then we remove all duplicates and use the
-c(show count) flag to count the duplicates
- Then we remove the leading spaces from the output (which we need to do so that we can sort properly)
- Then we sort numerically and in reverse - the highest count first
- Finally, we limit the results to the first ten items
If you need a refresher on the shell history,
uniq the check the Slice and Dice Text chapter. If the
sed command doesn't look familiar then check the Advanced Text Manipulation with Sed chapter.
If you want to see a more detailed breakdown of how the script works, check Appendix - How the Script Works. But this is not necessary for you to follow the content in this chapter.
Now save the file. In your shell, run the following command to execute the file:
sh (shell) command starts a new shell. When we pass the path of a shell script, the shell command will run the script and then exit. The output you see will look something like this:
common commands: 463 vi 267 gc 238 ga . 212 ls 169 gpo 122 make dev 112 gl 97 gcm 96 gpr
You can see my most common commands are short aliases for Git commands (the ones that start with ‘g’), opening Vim, running a makefile command and a few others.
We now have a basic shell script. Let's look at a few different ways we can run the script.
You can use the
\ backslash character to create a ‘continuation’ that tells the shell it needs to join lines up. This allows you to break long commands into multiple lines.
As an example, we could re-write our pipeline command to look like this:
# Show the most commonly used commands. tail ~/.bash_history -n 1000 \ | sort \ | uniq -c \ | sed 's/^ *//' \ | sort -n -r \ | head -n 10
This will probably look very familiar to anyone with a background in functional programming!
Be careful when you split lines up - the continuation character must be the last character on the line. If you add something after it (such as a comment) then the command will fail.
Running a Shell Script
There are a few different ways we can run shell scripts.
The first is to run a shell program and pass the script as a parameter. This is what we did in the earlier example. Here's another example of how we could run the script we created:
This is a perfectly valid technique. Now let's see the other ways we can run a script.
The next way we can run a script it is make it ‘executable’. This means we change the file permissions of the script file, adding the ‘executable bit’. This tells the systems we can run the file. We use the
chmod (change file mode) command to do this:
chmod +x ~/scripts/common.v1.sh
chmod command looks unfamiliar then check the Understanding Commands chapter. Now that the file has been made executable, we can simply enter the path to the file and run it, as if it was any other command:
There is a problem with this approach though. How this file is executed is going to vary depending on how your system is set up2. For example, if you are using Bash, then the script will run in a new instance of the Bash shell. However, if you are using the Z shell, then the script will most likely run in the
sh program (and depending on your system, this program might just be a link to another type of shell).
We want to avoid any ambiguity and be explicit about what program should run our script. We can do this using a special construct called a shebang.
A shebang is a special set of symbols at the beginning of a file that tells the system what program should be used to run the file.
If we were to add a shebang to our
common.v1.sh file, it would look like this:
#!/usr/bin/sh # Write the title of our command. echo "common commands:" # Show the most commonly used commands. tail ~/.bash_history -n 1000 | sort | uniq -c | sed 's/^ *//' | sort -n -r | head -n 10
The shebang is the two characters -
#!. The name ‘shebang’ comes from the names of the symbols. The first symbol is a ‘sharp’ symbol (sometimes it is called a hash, it depends a little on context). The second symbol is an exclamation point. In programming the exclamation point is sometimes called the ‘bang’ symbol. When we put the two together, we get ‘sharp bang’, which is shortened to ‘shebang’.
Immediately after the shebang you write the full path to the program which should be used to open the file.
For example, if you wanted to write a script that is run in Python, you could do this:
#!/usr/bin/python3 print('Hello from Python')
If we wanted to explicitly use the Bash shell to run a script, we might use a shebang like this:
#!/usr/bin/bash echo "Hello from Bash"
What about Node.js? Easy!
#!/usr/bin/node console.log("Hello from Node.js");
Shebangs - Dealing with Paths
When we use a shebang we need to provide the full path the executable that will be used to run the script.
For example, what if we want to use Ruby to run a script we could write a script like this:
#!/usr/bin/ruby puts 'Hello from Ruby'
But there is a problem here. This will only work if you have the Ruby program installed in the location specified after the shebang (i.e.
/usr/bin/ruby). If you do not have the Ruby program in this location the script will fail to run.
How can we know where the user will have a specific program installed?
There is a common trick for dealing with this issue. We can use the
env (set environment and execute command) command to run a command and it will work out the path for us.
env command is often used to show environment variables, but you can also use it to execute an arbitrary command (often with a modified environment). One handy feature of the
env command is that it looks through the
$PATH variable to find the path of the command to execute.
You can see this by running a command like the below:
$ env python3 Python 3.8.5 (default, Jan 27 2021, 15:41:15) [GCC 9.3.0] on linux Type "help", "copyright", "credits" or "license" for more information. >>>
We've used the
env command to run the
python3 command - and it worked out the correct path for us.
env in a shebang, specify the full path to
env (which should be the same on all Unix-like systems) and then provide the name of the command to run:
#!/usr/bin/env bash echo "Hello from Bash"
Or another example:
#!/usr/bin/env ruby puts 'Hello from Ruby'
Using a shebang to specify the exact command to run, and then using the
env command to allow the
$PATH to be searched is generally the safest and most portable way to specify how a shell script should run.
Sourcing Shell Scripts
We have discussed how to run shell scripts. You can also use the source (execute commands from a file) command to load the contents of a file into the current shell.
Remember that when we run a shell script, a new shell is created as a child process of the current shell. This means that if you change something in the environment, such as a variable, it will not affect the environment of the shell that ran the script.
Let's see an example. We'll create a script called set_editor.sh that sets the
EDITOR environment variable to
nano. The script's contents are below (can can also find it in the samples at ~/effective-shell/scripts/set_editor.sh):
EDITOR=nano echo "Editor changed to: $EDITOR"
Let's run this script and see what editor looks like before and after:
$ echo $EDITOR vim $ ~/effective-shell/scripts/set_editor.sh Editor changed to: nano $ echo $EDITOR vim
Notice that although we changed the
EDITOR environment variable in our script, the change has not persisted in the current shell. This is because each shell (and in fact, each process) gets its own copy of the environment.
If we want to run the commands in the file in the context of the current shell, we can use the
source command to load the file:
$ echo $EDITOR vim $ source ~/effective-shell/scripts/set_editor.sh Editor changed to: nano $ echo $EDITOR nano
Our existing environment has been changed. When we use
source, the commands in the file are executed in the current shell, rather than in a new shell.
We can see this even more clearly if we use the showpstree.sh file:
$ ~/effective-shell/scripts/show-info.sh bash └─sh /home/ubuntu/effective-shell/scripts/showpstree.sh └─pstree -l -a -s 2240
This script shows the current ‘process tree’, using the
pstree (show process tree) command. We can see that the
pstree command was run as a child of the
sh program. This program was run with the script path, by the shell I was using,
bash. This is a nice visualisation of what is going on - our
bash shell has run the showpstree.sh script in a child shell.
If we source the same file, we'll see that we do not create a new shell:
$ source ~/effective-shell/scripts/show-info.sh bash └─pstree -l -a -s 2169
There is a slightly more concise syntax that can be used to source a script - the dot sourcing notation. When the shell sees a
. dot character, it will source the file that follows:
$ . ~/effective-shell/scripts/show-info.sh bash └─pstree -l -a -s 2169
You may encounter this syntax as you look at things like shell configuration files, which we discuss in Chapter 24 - Configuring the Shell.
Installing Your Script
Before we finish with our shell script fundamentals, we'll take a look at one final commonly used pattern to run shell scripts - installing them as a local binary.
common.v1.sh script (with the added shebang) looks like this:
#!/usr/bin/env sh # Write the title of our command. echo "common commands:" # Show the most commonly used commands. tail ~/.bash_history -n 1000 | sort | uniq -c | sed 's/^ *//' | sort -n -r | head -n 10
If we have made the script executable with the
chmod command, then we can run the script by simply typing the location of the script in the shell:
$ ~/scripts/common.v1.sh common commands: 97 gcm 96 gpr ...
If we want to ‘install’ this script as a local command which we can run easily, we can create a symbolic link to the shell script in our
ln -s ~/scripts/common.v1.sh /usr/local/bin/common
ln (create link) command creates a link (which is like a shortcut in Windows and other desktop systems) in our
/usr/local/bin folder, with the name
common, which points to the script we have written. We can now run the
common command without specifying its path:
$ common common commands: 97 gcm 96 gpr ...
This works because when the shell sees a command, it searches through the folders in the $PATH environment variable to find out where the command is. And the
/usr/bin/local folder is in this list of paths.
Why do we use the
/usr/bin/local folder rather than the
/usr/bin folder? This is just a convention. In general, the
/usr/bin folder is for commands which are installed with package manager tools like
apt or Homebrew (on MacOS). The
/usr/local/bin folder is used for commands which you create for yourself on your local machine and manage yourself3.
In this chapter we've covered quite a few of the fundamentals of shell scripts:
- How to create a shell script
- How comments work in shell scripts
- How to handle long lines with continuations
- How to run a shell script
- How to make a shell script executable
- How shebangs work
- How to use the
envcommand to make our shebangs more portable
- How to ‘install’ scripts for the current user
In the next chapter we'll look at how to add logic to our shell scripts.
Appendix - How the Script Works
This section briefly covers how the
common.v1.sh script works. Assuming we have a history that looks like this:
vi README.md git status git checkout main git status restart-shell git status open . vi README.md open .
First we sort, putting duplicate lines next to each other:
git checkout main git status git status git status open . open . restart-shell vi README.md vi README.md
Then we use
uniq to remove duplicate adjacent lines, passing the
-c flag to include a count:
1 git checkout main 3 git status 2 open . 1 restart-shell 2 vi README.md
Now we remove the leading whitespace:
1 git checkout main 3 git status 2 open . 1 restart-shell 2 vi README.md
Finally we sort numerically (by using the
-n flag) and in descending order (by using the
3 git status 2 vi README.md 2 open . 1 restart-shell 1 git checkout main
Why the numeric sort? If we didn't sort numerically and instead performed the default lexographic sort and have more than single digit results, the output would look like this:
1 git checkout main 1 restart-shell 13 git status 2 open . 2 vi README.md
This is a lexographic sort - the line starting with 13 comes after the line starting with 2. We want to sort by the value of the number.
The path to the shell history file is normally available in the
$HISTFILEenvironment variable. However, in a non-interactive shell this variable is not set (and when we run a shell script, it is run in a non-interactive shell). We'll see more about interactive and non-interactive shells later, this is just a note in case you are wondering why we don't use the
Try putting the command
pstree -p $$in a shell script and running the script - you'll see exactly what process is run. ↩︎
If you want to know more about these folders and the conventions behind them then check back soon, I am going to be adding an entire section on Linux Fundamentals, and one of the chapters will specifically be on the Linux Filesystem. This will cover ‘The Linux Filesystem Hierarchy Standard’ which defines how folders like this should be used. ↩︎