Scripting on Linux

Origins of scripting languages

The earliest application software was literally hardwired, and the "operating system" was a rudimentary embedded facility to execute this stiffware. With the first generation of what we would nowadays understand as an operating system, the computer operator would be able to load and execute programs from a teletype terminal by typing in the identity of the next batch job to be run, i.e. to load it into memory from magnetic or paper tape and run it. Any runtime information to be supplied by the operator to the job to modify the way it operated would require additional parameters to these job identifiers, or commands. A language used for this purpose on IBM's mainframes was called JCL, or job control language.

As these systems evolved into being able to support routine operations, it became inefficient for the computer operators to have to retype all of this information needed to run a set of jobs in the same daily, weekly or monthly sequence, or batch queue. Running a sequence of such commands from a file, as opposed to retyping them every time enabled operations to be automated. JCL allowed jobs which failed due to system downtime to be rerun from the original input data, by keeping different generations of the input and output files on disk, which allowed for systems for applications such as bank accounting or payrol which required a higher guarantee of security and completion than the reliability of system components then available would suggest. Many of the features of mainframe systems were then reinvented a number of times on smaller and cheaper classes of machine, including minicomputers and PCs.

This led to the development of a command language to suit each operating system. Examples included TSO/Clist (IBM Time Shared Operations), PCL (Prime Primos), DCL (DEC VAX/VMS), and Aegis (Apollo DomainOS).

For Microsoft OSs, the original shell script (MSDOS batch or .bat files) remained as a very primitive batch control language, with flow control limited to goto's and simple branches. Instead of developing this batch language, Microsoft froze it in the early nineties, in order to increase sales of a more advanced but proprietary scripting language (Basic) for users who would have to pay for a seperately licenced product in order to use this means of obtaining greater flexibility in system control. Very early versions of Microsoft Basic were packaged with MS-DOS, while later versions were sold seperately.

Within the Unix world, due to the competitive development of system variants by many different companies and the fact that no monopoly was able to prevent this, a number of different shells were designed, with useful ideas being shared between these. As the usage of Unix increased, so these shell languages which were bundled with the system have grown. Some of the most popular shells have included the Bourne Shell ( typically /bin/sh ) and the 'C' shell (

/bin/csh

). Other variants included the Korn shell which is a superset of most of the Bourne and 'C' shells and the Ash shell. The GNU version was called the Bash shell, (Bourne Again SHell), to recognise its behaviour as a superset of the Bourne shell. This shell is also typically installed using the path

/bin/sh

in order to allow it to run the many older Bourne shell scripts already in existence unmodified. As languages, Unix shell scripts have all the usual 3rd generation features, including loop constructs, variables, arrays, branches and functions.

Learning scripting languages

If you have already learned how to program using a 3rd generation language, e.g. 'C' with many similar features to a language such as Bash, Perl or Python, making use of a subset of features of the scripting languages similar to those you already know does not require the same amount of time or effort as learning the same features the first time around. Those requiring an in-depth understanding of these languages will need to read the appropriate books and on-line tutorials and carry out a comprehensive series of programming exercises. In other cases a useable subset of knowledge can be obtained by reading the source code of existing programs and executing these, and by conducting a number of small experiments supplemented with tactical use of the reference information provided with these languages. By this means, your programming knowledge can grow on an as-needed basis.

A simple example of a shell script

A number of features of the Bash language are demonstrated in this example program, which puts a wrapper around the rm (remove file) command:

#!/bin/sh
# cautious shell script
#
# performs similar function to rm but cautiously

if [ $# = 0 ]; then
  echo usage:
  echo cautious name_of_file_to_be_deleted
  exit
fi

echo "are you sure you want to delete $1" '? (y/n)'
read ans
if [ "$ans" = "y" -o "$ans" = "Y" ]; then
  if [ -w $1 ]; then
    rm $1
    echo $1 has been deleted
  else
    echo cautious: $1 access denied or does not exist
  fi
else
  echo $1 not deleted
fi

Explanation

#!/bin/sh
# cautious shell script

The first 2 characters: #! when present in an executable file instruct Unix-style kernels to load and execute the following path as an interpreter program, with the rest of the script as the program for the interpreter program to interpret and execute. You would use
#/usr/bin/perl as the first line of a Perl script, if the perl interpreter program is installed at /usr/bin/perl . The second line is a comment because all lines starting with # are comments as far as the Bash language is concerned.

if [ $# = 0 ]; then
  echo usage:
  echo cautious name_of_file_to_be_deleted
  exit
fi

$# is a shell variable giving the number of parameters with which the shell script was called on the command line or by other means. The cautious shell script is only useful if it has a parameter, i.e. the name of a file to be deleted. [ expr ]; is a shell builtin command (alternative name: test expr; ). = within an expression returns true if both sides are the same. Replacing the test with its return value controls whether the if ... then branch or an optional else branch is used (the second example has an else branch). exit terminates the script, and the fi keyword (if backwards) terminates the if expr; then statement block. In other words, the then and fi keywords act in a similar manner to the opening and closing {} curly braces around a statement block in 'C'

digression

You can use any suitable program, including one you have written yourself in place of the test program. The Unix shell convention is for a program which exits successfully to end with a return of 0 (which the shell considers as true), and for an error exit to result in a return of 1 or more (which the shell considers false). This is the opposite way round to how this is done within 'C' branches. Note that the value returned by a program (e.g. using the return statement in 'C') is different from the standard output of the same program. The shell command:

echo $?

gives you the return code of the last foreground command. You can also obtain this value within shell scripts using the special variable $? .

end of digression

echo "are you sure you want to delete $1" '? (y/n)'

The echo command sends its parameters to stdout, and is used here to prompt the interactive shell-script user. Strings quoted with "" double quotes have variables and wildcards expanded. $1 is the name of the first shell script parameter accessed as a variable (the path of the file to be deleted). So the shell will change $1 to file_to_be_deleted, if you happened to run the script using the command:
./cautious file_to_be_deleted .

The reason for putting the second string into '' single quotes was to prevent the ? wildcard (regular expression) being expanded into a list of single character filenames in the current directory. This kind of shell interpolation doesn't happen when strings are quoted with single quotes.

read ans

This reads the stdin, which in interactive mode is the keyboard. The string read is to be made accessible using a shell variable $ans . (In some contexts you need the $, but in others you don't. Builtins such as export and read are only useful with shell variables, so there is no need for the $ here as this unambigously refers to the shell variable: $ans instead of the text: "ans" or file named: ans .

if [ "$ans" = "y" -o "$ans" = "Y" ]; then
  if [ -w $1 ]; then
    rm $1
    echo $1 has been deleted
  else
    echo cautious: $1 access denied or does not exist
  fi
else
  echo $1 not deleted
fi

The use of -o is a Boolean OR operation, true if an expressions on either side is true. The true or false expression: -w $1 is used to discover whether the file referred to by the parameter $1 exists and is writeable. If it is, this returns true if not it returns false. rm $1 deletes the file parameter. The rest of the shell script should be straightforward.

A script with a loop and debugging

#!/bin/sh
# skeleton shell script to provide menu framework
#
# Author: Richard Kay

debug=`echo ${1:-nodebug}`
if [ $debug = "debug" ]; then
  set -vx
fi

finish=no
while [ $finish = "no" ]; do
  echo
  echo SHELL SCRIPT MENU
  echo =================
  echo
  echo "enter   for option"
  echo "-----   ----------"
  echo "  1     first option"
  echo "  2     second option"
  echo "  3     third option"
  echo "  Q     to quit"
  echo
  echo "please enter option"
  read option
  option=${option:-c}
  if [ $option = "1" ]; then
    echo you have selected option 1
  elif [ $option = "2" ]; then
    echo you have selected option 2
  elif [ $option = "3" ]; then
    echo you have selected option 3
  elif [ $option = "q" -o $option = "Q" ]; then
    finish=yes
  else
    echo invalid entry. please try again
    echo
  fi
  echo
  echo press return to continue
  read dummy
done

Explanation

debug=`echo ${1:-nodebug}`

This line sets a shell variable called debug, either to parameter $1 if this exists, or to the text "nodebug" if parameter $1 doesn't exist. (This is one of various ways of specifying default values for unset variables which you can read about in the bash man page.) Use of the grave quotes:

` `

causes the standard output of the enclosed echo command to be assigned to the debug variable. If this script is called menu, you can execute is as:

menu debug

In order to cause it to run in debug mode.

if [ $debug = "debug" ]; then
  set -vx
fi

If you run it in debug mode, set -vx in the above branch causes command lines to be displayed on standard error, both before and after variable and wildcard expansion. This allows the programmer to see what is going on inside the shell script as it executes.

The while loop enabling the menu to be displayed repeatedly until the user quits is straightforward to those who have used another language using if for a branch and while for a loop. The difference between the if - then - fi block structure and the while - do - done block is the keywords used, and the fact that an if block will execute 0 or 1 times, and the while loop will execute the block 0 or more times, until the controlling test is false.

Processing a table of data by selecting rows and columns

#!/bin/sh
# login analysis shell program

rm temp1
cat may.logins | grep console | grep ')' | awk '{ print $9 }' \
  | sed '1,$s/)/ /g' | sed '1,$s/(/ /g' \
  | sed '1,$s/:/ /g' | sed '1,$s/+/ /g' > temp1

total=0
numrecs=`cat temp1 | wc -l `
count=0
while [ $count -lt $numrecs ]; do
  count=`expr $count + 1`
  record=`sed -n ${count}p temp1`
  fields=`echo $record | wc -w`
  if [ $fields -eq 3 ]; then
    days=`echo $record | awk '{ print $1 }'`
    mdays=`expr $days \* 24 \* 60`
    hours=`echo $record | awk '{ print $2 }'`
    mhours=`expr $hours \* 60`
    mins=`echo $record | awk '{ print $3 }'`
  else
    mdays=0
    hours=`echo $record | awk '{ print $1 }'`
    mhours=`expr $hours \* 60`
    mins=`echo $record | awk '{ print $2 }'`
  fi
  total=`expr $total + $mdays + $mhours + $mins`
# echo $total $fields
done
loggedhours=`expr $total / 60`
echo total logged hours is $loggedhours

This example combines a number of the features of previous examples, using awk, grep and sed filters to access specific rows from columns, and to exclude unwanted data from the analysis. The input data is a set of login records. This application was used to analyse average usage of 20 workstations during particular months. Here are some records from the may.logins input file:

reboot    ~                  Tue May 11 13:14
shutdown  ~                  Tue May 11 13:15
usr11361  console            Tue May 11 09:04 - 10:22  (01:18)
usr11187  console            Mon May 10 18:53 - 20:30  (01:36)
usr11187  console            Mon May 10 18:50 - 18:53  (00:02)
usr11187  console            Mon May 10 18:38 - 18:50  (00:12)
usr11513  console            Mon May 10 15:15 - 16:27  (01:11)
usr11451  console            Mon May 10 12:11   still logged in
usr11456  console            Mon May 10 10:53 - 15:14  (04:21)
usr11138  console            Mon May 10 09:03 - 10:40  (01:36)
usr12069  console            Sat May  8 11:05 - 09:01 (1+21:55)
usr12069  console            Sat May  8 11:00 - 11:04  (00:04)

Explanation

cat may.logins | grep console | grep ')' | awk '{ print $9 }' \
  | sed '1,$s/)/ /g' | sed '1,$s/(/ /g' \
  | sed '1,$s/:/ /g' | sed '1,$s/+/ /g' > temp1

Note use of backslash \ to escape newlines, to allow a long single shell pipelined command to be continued over multiple lines. This command line extracted only lines containing the word "console" and a back round bracket ')' . Column 9 was extracted (space seperated columns is the awk default). sed was then used to strip the round brackets and replace colons and plus signs with spaces. This extract, containing either 2 or 3 fields (2 for hours and minutes or 3 for days, hours and minutes) was then written to the temp1 file.

total=0
numrecs=`cat temp1 | wc -l `
count=0

This initialised shell variables total and count to zero, and counted the number of records in temp1. Note again use of grave accent quotes

` `

to make the stdout of the pipeline:
cat temp1 | wc -l available for assignment to the shell variable: numrecs.

while [ $count -lt $numrecs ]; do
  count=`expr $count + 1`
  record=`sed -n ${count}p temp1`

This main loop counts through the records in the extract, processing each record in turn. 1 is added to count at the top of this loop. This was an old Bourne shell script, so the external program expr was used to do arithmetic. If this script had been written for the Bash shell, the builtin arithmetic evaluator: let would have been used. A newer bash shell using let for arithmetic is shown below.

The value of the shell variable $count is substituted so that on the first use of the loop row 1 is assigned to $record ( sed -n 1p ) and on the second use of the loop row 2 is assigned ( sed -n 2p ) etc. The {} brackets are used around count so the shell doesn't look for variable $countp , which doesn't exist.

  fields=`echo $record | wc -w`
  if [ $fields -eq 3 ]; then
    days=`echo $record | awk '{ print $1 }'`
    mdays=`expr $days \* 24 \* 60`
    hours=`echo $record | awk '{ print $2 }'`
    mhours=`expr $hours \* 60`
    mins=`echo $record | awk '{ print $3 }'`

This counts the words in record (i.e. the number of columns or fields). If there are 3 fields recording the logged in time, these are for days, hours and minutes, and the minutes present in each is calculated. The expr multiplier operator: * has been escaped from the shell using a backslash, so it isn't interpreted as a file glob.

  else
    mdays=0
    hours=`echo $record | awk '{ print $1 }'`
    mhours=`expr $hours \* 60`
    mins=`echo $record | awk '{ print $2 }'`
  fi
  total=`expr $total + $mdays + $mhours + $mins`

Otherwise, the login duration was less than a day, and the minutes from the hours and minutes logged in are calculated. Whether there were 2 or 3 fields, the number of minutes for this particular login record is added to the total.

done
loggedhours=`expr $total / 60`
echo total logged hours is $loggedhours

The loop is terminated with: done, and the total logged login hours for the month is calculated and output.

Doing arithmetic internally within the shell

Some of the earlier Unix shells didn't have builtin arithmetic operators, so they farmed this job out to external programs such as expr as we saw above. This can be done more quickly using the Bash shell let builtin, as in the following example script:

#!/bin/sh
# new bash shell arithmetic example
echo 'enter 2 numbers'
read first second

let "plus = $first + $second"
let "minus = $first - $second"
let "times = $first * $second"
let "divide = $first / $second"

echo plus $plus minus $minus times $times divide $divide

Extreme scripting: Perl, Python, Tcl and Ruby

Shell script languages, when combined with what we covered last week in connection with regular expression, pipes, filters and redirection, enable significant systems and network automation compared to other methods of operation (e.g. using interactive GUIs or console commands). However, system-dependant scripts can become unweildy and difficult to maintain when users attempt to use these to construct applications or utilities requiring more than a couple of hundred lines of source code and need to migrate these programs between multiple platforms.

Another difficulty is that while manipulation of text-based data structures involving tables, rows and columns is made easy by combining shell scripts with the use of external filters such as awk, sed and grep and the use of external files to store intermediate results, the effect of a looped program loading and executing external programs very many times in order to process large data sets, and reading and writing external files many times in order to handle intermediate results will cause poorer machine performance than using a language which can handle the required operations internally. A consequence of this is that a shell script which automated addition of large groups of student accounts to a network took up to an hour to run, compared to one minute for the Perl program which replaced it. Writing this application in 'C' would have taken a few weeks programming work instead of a couple of days, which was not an option, even though this could have reduced execution time from a minute to a few seconds.

In order to obtain all of the advantages of Shell Scripts and to go beyond their disadvantages, fully portable languages were developed. These languages, including Perl, Tcl, Python and Ruby can be used to handle simple scripting-type applications together with more complex applications, e.g. requiring object-oriented active website development. These languages build upon the features available in Unix shell languages, using almost identical syntax for many purposes, e.g. handling regular expressions.

One of the main differences between interpreted 'scripting' languages and applications programming languages such as 'C', 'C++', Pascal, Java (and ealier languages such as Cobol and Fortran), is the existence of a tradeoff between programmer and machine efficiency. You would generally use an interpreted scripting language in order to maximise programmer performance (the ability to develop a given application with least programming effort). However, to obtain the maximum machine performance you would need to use a fully compiled language such as 'C' or 'C++'.

Where you intend to combine the rapid application development of a scripting language, with the rapid execution of 'C', some profiling may be required in order to investigate in which parts of the system most of the machine cycles are being used. It is likely to be possible to divide the task into parts which are more complex and execute less frequently, which can use the benefits of a scripting language, and into the time-consuming deeply nested loops which can be handled using a 'C' program, and possibly by using more efficient data structures.

Perl example

The login analysis program described above, was rewritten in Perl.

#!/usr/bin/perl
# login analysis
$total=0;
open(LOGINS,"may.logins");
while(<LOGINS>){
  if(/console/) {
    @rec=split;
    $_ = @rec[8];
    s/\)//;
    s/\(//;
    s/\+/:/;
    @fields=split /:/;
    $numf=@fields;
    if ($numf == 3){
      $mdays=@fields[0] * 24 * 60;
      $mhours=@fields[1] * 60;
      $mins=@fields[2];
    } else {
      $mdays = 0;
      $mhours = @fields[0] * 60;
      $mins=@fields[1];
    }
    $total=$total + $mdays + $mhours + $mins;
    # print "$total\n";
  }
}
$loggedhours=$total/60;
print "total hours logged is $loggedhours\n";

This program runs much faster than the shell script, because everything is done inside the same process. There are many syntactic similarities, but Perl borrows array and loop notation from 'C' and some other notation from grep, awk and sed. As in Bash, $ is used to introduce single (scalar) variables and @ is used for arrays. $_ is used for a default scalar variable, so that the substitution operation e.g:
s/\)//; which strips a closing round bracket from a string, doesn't need to specify which string it operates upon.