Linux Bash Error Trapping
Contents |
and Signals and Traps (Oh My!) - Part 1 by William Shotts, Jr. In this lesson, we're going to look at handling errors during the execution of your scripts. The difference between a good program and a poor one is often shell script error handling measured in terms of the program's robustness. That is, the program's ability to handle
Bash Exit 0
situations in which something goes wrong. Exit status As you recall from previous lessons, every well-written program returns an exit status when bash trap it finishes. If a program finishes successfully, the exit status will be zero. If the exit status is anything other than zero, then the program failed in some way. It is very important to check the exit
Bash If Exit Code
status of programs you call in your scripts. It is also important that your scripts return a meaningful exit status when they finish. I once had a Unix system administrator who wrote a script for a production system containing the following 2 lines of code: # Example of a really bad idea cd $some_directory rm * Why is this such a bad way of doing it? It's not, if nothing goes wrong. The bash script exit on error two lines change the working directory to the name contained in $some_directory and delete the files in that directory. That's the intended behavior. But what happens if the directory named in $some_directory doesn't exist? In that case, the cd command will fail and the script executes the rm command on the current working directory. Not the intended behavior! By the way, my hapless system administrator's script suffered this very failure and it destroyed a large portion of an important production system. Don't let this happen to you! The problem with the script was that it did not check the exit status of the cd command before proceeding with the rm command. Checking the exit status There are several ways you can get and respond to the exit status of a program. First, you can examine the contents of the $? environment variable. $? will contain the exit status of the last command executed. You can see this work with the following: [me] $ true; echo $? 0 [me] $ false; echo $? 1 The true and false commands are programs that do nothing except return an exit status of zero and one, respectively. Using them, we can see how the $? environment variable contains the exit status of the previous program. So to check the exit status,
and Signals and Traps (Oh, My!) - Part 2 by William Shotts, Jr. Errors are not the only way that a script can terminate unexpectedly. You also have to be concerned with signals. Consider the following
Shell Script Exit On Error
program: #!/bin/bash echo "this script will endlessly loop until you stop it" while true;
Bash Trap Exit Code
do : # Do nothing done After you launch this script it will appear to hang. Actually, like most programs that linux kernel error codes appear to hang, it is really stuck inside a loop. In this case, it is waiting for the true command to return a non-zero exit status, which it never does. Once started, the script http://linuxcommand.org/wss0150.php will continue until bash receives a signal that will stop it. You can send such a signal by typing ctrl-c which is the signal called SIGINT (short for SIGnal INTerrupt). Cleaning up after yourself OK, so a signal can come along and make your script terminate. Why does it matter? Well, in many cases it doesn't matter and you can ignore signals, but in some cases it will http://linuxcommand.org/wss0160.php matter. Let's take a look at another script: #!/bin/bash # Program to print a text file with headers and footers TEMP_FILE=/tmp/printfile.txt pr $1 > $TEMP_FILE echo -n "Print file? [y/n]: " read if [ "$REPLY" = "y" ]; then lpr $TEMP_FILE fi This script processes a text file specified on the command line with the pr command and stores the result in a temporary file. Next, it asks the user if they want to print the file. If the user types "y", then the temporary file is passed to the lpr program for printing (you may substitute less for lpr if you don't actually have a printer attached to your system.) Now, I admit this script has a lot of design problems. While it needs a file name passed on the command line, it doesn't check that it got one, and it doesn't check that the file actually exists. But the problem I want to focus on here is the fact that when the script terminates, it leaves behind the temporary file. Good practice would dictate that we delete the temporary file $TEMP_FILE when the script terminates. This is easily accomplished by adding the following to the end of the script: rm $TEMP_FILE Th
ensuring they always perform necessary cleanup operations, even when something unexpected goes wrong. The secret sauce is a pseudo-signal provided by bash, called EXIT, that you can trap; http://redsymbol.net/articles/bash-exit-traps/ commands or functions trapped on it will execute when the script exits for any reason. Let's see how this works. The basic code structure is like this: #!/bin/bash function finish { # Your cleanup code here } trap finish EXIT You place any code that you want to be certain to run in this "finish" function. A good common example: creating a temporary scratch directory, then shell script deleting it after. #!/bin/bash scratch=$(mktemp -d -t tmp.XXXXXXXXXX) function finish { rm -rf "$scratch" } trap finish EXIT You can then download, generate, slice and dice intermediate or temporary files to the $scratch directory to your heart's content. [1] # Download every linux kernel ever.... FOR SCIENCE! for major in {1..4}; do for minor in {0..99}; do for patchlevel in {0..99}; do tarball="linux-${major}-${minor}-${patchlevel}.tar.bz2" curl -q "http://kernel.org/path/to/$tarball" script exit on -o "$scratch/$tarball" || true if [ -f "$scratch/$tarball" ]; then tar jxf "$scratch/$tarball" fi done done done # magically merge them into some frankenstein kernel ... # That done, copy it to a destination cp "$scratch/frankenstein-linux.tar.bz2" "$1" # Here at script end, the scratch directory is erased automatically Compare this to how you'd remove the scratch directory without the trap: #!/bin/bash # DON'T DO THIS! scratch=$(mktemp -d -t tmp.XXXXXXXXXX) # Insert dozens or hundreds of lines of code here... # All done, now remove the directory before we exit rm -rf "$scratch" What's wrong with this? Plenty: If some error causes the script to exit prematurely, the scratch directory and its contents don't get deleted. This is a resource leak, and may have security implications too. If the script is designed to exit before the end, you must manually copy 'n paste the rm command at each exit point. There are maintainability problems as well. If you later add a new in-script exit, it's easy to forget to include the removal - potentially creating mysterious heisenleaks. Keeping Services Up, No Matter What Another scenario: Imagine you are automating some system administration task, requiring you to temporaril