Skip to main content

Shell Scripting

Much of the notes are taken from Classic Shell Scripting book.

Intro

What is Unix? Unix is a family of operating systems with a command line interface, initially developed at Bell Labs in 1970. It served as a backbone for many of the modern OSes like Linux or MacOS. Due to the diversity of Unix systems and their tools (grep might have different flags between systems), the POSIX standards were developed to ensure that software tools that conformed to POSIX could run on any of the POSIX compliant OSes - leading to standardized tools. Hence, shell scripts became portable (so you have bash which is a POSIX shell)!

What is the shell? As opposed to an OS, the shell is a user interface to the OS. Typically this is via a command line interface. Shells like sh, bash or zsh for example are interpreter programs that can intepret your commands and communicate with the OS. The typical interaction pattern is REPL or Read-Eval-Print-Loop.

A way to describe bash for example, is that it is a Unix shell.

Even more terms...

I actually get more confused when I start hearing about GNU/Linux, so here is a helpful analogy of the key differences:

  • Unix: the original family of OSes
  • POSIX: the standard that describes how Unix-like systems should behave so tools can become standardized for all of them
  • Linux: an OS kernel that when combined with GNU tools makes a fully usable operating system
  • GNU: stands for GNU is not Unix and is an OS developed and subsequently merged with Linux kernel for a full OS - this is why it is often grouped as GNU/Linux

Scripting

The shell scripting language is just like Python, Perl and Ruby - it's a high level language where you can express complex operations clearly and easily, and you can write a powerful useful script in short amount of time. They are not compiled but interpreted so a compiled program interpreter will read the script and translate it into an internal form before executing it.

Principles

  • Lines of text are the universal format in Unix - not binary.
  • Write programs to read from stdin and write to stdout, with error messages to stderr. This makes programs easy to use as data filters which act as components in larger pipelines or scripts.
  • Avoid messages mixed into stdout of program (at least by default).

Executing Commands

Many of the system command exist in compiled form on the system (you can find them with which). When these programs are executed, Bash performs a fork-and-exec. Bash creates a new process with the exact environment as the parent via a fork, followed by an exec call to the system to replace the contents of the current process with the new program.

For instance when you run find, the shell forks, then the child process loads the find program into memory and sets up command line args.

Key Idea

Blank...

The #! Shebang

The special first lines in the script start with #! to specify to the kernel the full path to the interpreter to use to run the program, as well as a single option to pass to the interpreter.

Typically shell scripts start like this:

#! /bin/sh

Which will run bin/sh scriptname under the hood.

Detail

The sh references a command interpreter (shell) which is NOT an operating system.

#! /bin/csh -f

Or invoke a standalone awk program with,

#! /bin/awk - f

Commands and Args

With shell there are:

  1. built-in commands - e.g. cd
  2. shell functions - self contained chunks of code written in shell language that act like regular commands
  3. external commands - commands the shell runs by creating a separate process

With commands, short options can start with a dash -c, while long options can start with one dash -p1 or two --backup. Semicolons ; separate multiple commands on the same line so the shell will execute them sequentially

Variables

Define variables with = and no intervening spaces.

myvar=this_is_a_long_string
echo $myvar

first=isaac middle=bashevis last=singer # multiple assignments allowed per line
fullnam="$first $middle $last"

Globbing

While commands and utilities like grep, sed, awk recognize regex, the bash interpreter does not use regex. Instead, bash itself operates via globbing.

Globbing is the process of filename expansion using wild cards. In other words, given a glob pattern, globbing expands the wild card pattern into a list of pathnames matching the pattern. Note that wild card characters *, ?, [] have different meanings than in RE.

Key Idea

In bash arguments, when it sees unquoted wild cards it will do glob expansions before the command runs. Otherwise, quoted wildcards will safely be interpreted as regex if passed to grep or any other regex based commands.

Reiterating the key idea: when you use special characters in unquoted command line arguments, bash performs filename expansion. For example, when you run the line:

rm *.txt

If the files in your directory are file1.txt, file2.txt, file3.txt, bash will expand the wildcard argument into:

rm file1.txt file2.txt file3.txt

You can also see this by doing:

echo *.txt
# file1.txt
# file2.txt
# file3.txt
echo notafile *.txt
# notafile
# file1.txt
# file2.txt
# file3.xt
Key Idea

A string is a wildcard pattern if it contains one of the characters ?, *, or [.

What are each of the glob special characters and their function?

  • ? matches any single character
  • * matches any string, including the empty string
  • [...] matches a single character that is one of any of the characters inside the brackets
  • [!...] matches a single character that is in the complement of the set of characters
  • [0-9] or any two characters separated by - denote a range. For example [A-Fa-f0-9] is equivalent to [ABCDEFabcdef0123456789]

Job Control

Bash allows users to selectively suspend and resume processes through job control. Each job is associated with a pipeline aka a string of commands chained with | (stdout) pipes or |& (stdout + stderr) pipes.

Foreground, Background Processes

The implementation for job control is that each process is given a process group ID. Processes with the same process group ID are part of the same process group. Foreground and background processes are based upon whether the process shares the current terminal process group ID.

Foreground processes, or members of the foreground process group (equal to the current terminal process group ID) receive keyboard generated signals like SIGINT. Background processes are those processes with process group ID different from the controlling terminal. Background processes are unaffected by keyboard generated signals.

You can suspend running jobs with Ctrl+Z, then use fg to continue in foreground, bg in background, or kill to kill.