10. Secure CGI Input and debugging

contents

10.1 Extending the cgiutils module to handle inputs

To avoid confusion with the previous version I renamed the extended version cgiutils2.py . We're going to extend the module we started work on last week, by getting it to look after some of the details previously required in the main application. While we're doing this we can shorten some names, so that instead of saying cgiutils2.form.keys() it is easier and less error prone to say cgiutils2.keys() instead. We can do the same with the has_key() method of the form object returned by cgi.FieldStorage() :

import cgi
form = cgi.FieldStorage()

def keys():
  """ shortcut to form.keys() method. Returns a list of keys for which
  form values were entered """
  return form.keys()

def has_key(key):
  """ shortcut to form.has_keys() method. Returns a 1 if named key parameter is
  present in form input and 0 otherwise. """
  return form.has_key(key)

def has_required(required):
  """ returns 1 if all keys in required list parameter are present in form
  or 0 if form user has forgotten to submit input required form fields"""
  submitted=keys()
  for field in required:
    if field not in submitted:
      return 0
  return 1

The has_required() function is passed a list of the required form input keys as a parameter. When called from a CGI application, e.g:

if not has_required(["name","email"]):
   html_end(error="one or more required inputs are missing.")
   return

The user can be informed and the CGI program terminated before attempting to do anything it can't without required inputs being present.

10.2 A PIN generating function

A minor bit of unfinished business was our session key or PIN generation code, which has also been added to cgiutils2.py .

def make_pin(min=1000,max=9999):
  """ generates a random PIN number used for user authentication or
  session tracking between min and max values. """
  import random
  return random.randrange(min,max)

The make_pin() function was tested by coding a minor change to the tester() function:

 pin=int(raw_input("enter (int) letsplay PIN or 0 for a random one: "))
        if not pin: pin=make_pin()

and by rerunning test 5 (form HTML) on the interactive test menu.

10.3 Handling multiple values for the same key

HTML form listboxes and checkboxes with the same name can result in more than one value being input for the same key. key=value pairs (seperated by & ampersands) specified as part of a query string can also generate more than one value for the same key. In these cases, the form.getvalue(key) method or the form[key].value attribute both return lists of values instead of a single value. Unfortunately, we can't control how our CGI program will be accessed. Someone might design their own form to access it, or specify a query string when we expected input from a form. To make the CGI robust enough to avoid crashing unexpectedly in these situations we need to test the type of data returned to see if it has single or multiple values. Python has a built in type() function which can be used to check the type of data returned.

Specific types defined in the types library module can be tested for directly. We can also do this indirectly by comparing the object returned by type(unknown) against the objects returned by type([]) when we want to see if name: unknown is a list, or type(1) when we want to test for an integer or type ("") when we want to test for a string and so on.

This is something best packaged neatly into our cgiutils2 module. Most of the time, when designing a CGI application, we'd prefer not to have to be botherered whether the data arrives in the form of a list or a string. Python 2.2's version of the cgi module defines a pair of methods firstval() and listval() which will give us the data respectively in single string or list of strings form. firstval() always returns the single or first value as a string (a default other than "" can be specified). listval() will always return a list. If there was only 1 value listval() will return a single valued list.

As we are not yet using Python 2.2 I have added the equivalent functions to the cgiutils2.py module file.

def firstval(key,default=""):
  """ returns value from form as a single string, or default for
  empty/wrong type. Returns first of a multi-valued submission. """
  value=form.getvalue(key)
  if type(value) == type([]): # multi value submission
    if value == []: # empty list ??
      return default
    elif type(value[0]) == type(""): # first element is string
      return value[0]
    else: # list, but 1st element not a string
      return default
  elif type(value) == type(""):
    return value
  else: # dont know what type of data this is
    return default

def listval(key,default=[]):
  """ returns val as a list of strings"""
  value=form.getvalue(key)
  strings=[]
  if type(value) == type([]): # multi value submission
    for i in range(len(value)):
      if type(value[i]) == type(""):
        strings.append(value[i])
  elif type(value) == type(""):
    strings.append(value)
  if strings == []:
    return default
  else:
    return strings

10.4 Handling float and int values

The problem with using built in float() and int() functions directly within a CGI application to convert strings containing floats and ints into numeric values is that if the string contains invalid data an exception will be raised and the application will crash. This isn't such a problem in an interactive environment where we can see the crash dump on the standard output. In the CGI environment getting at this information telling us what kind of error occurred and where it occurred is less easy. If the user inputs an integer or float into a HTML form, just as with raw_input() this is made available to the program in string format and we have to convert it using the int() and float() builtins. By placing higher-level function wrappers around these, here called stoi() for string to int conversion, and stof() for string to float conversion, we can provide for default (typically false) objects if the input wasn't in integer or floating point format.

def stof(key,default=None):
  """ string to float conversion or returns default for non float value"""
  string=firstval(key) # converts to single string or ""
  try:
    f=float(string)
  except:
    return default
  else:
    return f

def stoi(key,default=None):
  """ string to int conversion or returns default for non int value"""
  string=firstval(key) # converts to single string or ""
  try:
    i=int(string)
  except:
    return default
  else:
    return i

10.5 Testing the extended cgiutils2 module

Before we use it in a real application we must test the extended functionality of this module. The parts of it developed last week were tested using a function which operated when the module was run as a standalone Python program, and by copying and pasting the various HTML outputs into any suitable HTML authoring tool with a preview facility and HTML syntax checker. (Quanta running on Linux is free software which does this.)

It is more difficult to arrange for the cgi.form object to have useful values outside the CGI environment. The following CGI application (formfields3.cgi) was used for this:

#!/usr/local/bin/python

# Python CGI Program to get form/URL fields
# and test parts of cgiutils2.py which require test CGI
# environment
import sys
sys.stderr=sys.stdout
import cgiutils2

cgiutils2.html_header(title="CGI test HTML",bgcolor='"#FFAAFF"')
if cgiutils2.has_required(["float","int","list"]):
  print "<p>all required input keys present</p>"
else:
  print "<p>one or more required input keys are missing</p>"
print "<ul>"
for key in cgiutils2.keys():
  print "<li>", ": "
  if key=="float":
    f=cgiutils2.stof(key)
    if f: print "float: %.2f" % f
  elif key=="int":
    i=cgiutils2.stoi(key)
    if i: print "int: %d" % i
  elif key=="list":
    list=cgiutils2.listval(key)
    if list:
      print "<ul>"
      for item in list:
        print "<li> list:",item, "</li>"
      print "</ul>"
  else:
    print key,": ",cgiutils2.firstval(key)
  print "</li>"
print "</ul>"
print "</body></html>"

When tested using the following URL:

http://copsewood.net/pycgi/formfields3.cgi?float=2.345=42&this=that&list=one&list=two

The following web page was generated:

CGI test HTML

  • : float: 2.35
  • : int: 42
  • : this : that
  • : list:
    • one
    • two

10.6 Debugging techniques for CGI programs

First check the obvious

Before getting this far you are expected to have run simple CGI programs which prove that the web server and CGI environment works, by printing the current time to the browser etc as HTML formatted output. You should also be able to run CGI programs on this server which print their input variables submitted using web forms or URL query strings to the browser. If you havn't got this far you shouldn't be surprised if you can't get something more complex to work.

Assuming you have got this far, the CGI programming environment presents further difficulties when debugging programs designed for this context. What do you do when you try to run a CGI program and all you can see in the web browser, and instead of the neatly formatted HTML output which the program was designed to generate, all you get is an obscure and cryptic error message generated by the web server ? This section looks at a number of techniques which can help you to get the information you will need about the status of your program. Of course you will still have to adopt a systematic approach in testing and correcting your CGI programs.

A good rule of thumb when asking for support is to first try to take reasonable steps in solving the problem yourself. Then your support request is more likely to interest the person you ask, because of what you have already tried and discovered for yourself. If the person you ask to help you is very experienced at programming they will probably expect you to make sure you understand the programming language constructs you are attempting to use, and have read the documentation for the modules you are importing and the web server first if you have not already done this.

Visual inspection

Let's assume that you have written and debugged more than 100 lines of Python code, including programs longer than 20 lines or so. You certainly should have at least this level of Python experience before attempting to write CGI programs. If you havn't then you will probably be out of your depth. If you have some experience you might be able fully to debug very simple CGI programs with only a few lines of code by visually inspecting for the errors. Visual inspection of code is certainly a useful excercise in programs of any size. The idea is to detect and quickly correct as many common bugs as possible more quickly than you are likely to achieve by other means. Common errors most easily corrected by this means include:

  1. missing colons (:) in if, while, for, def and class statements or other incorrect punctuation
  2. inconsistent or mis-spelled object or function names
  3. unbalanced () {} [] parentheses or quoted strings.
  4. inconsistent indentation, e.g. mixed tabs and spaces
  5. missing comments (you do want to be able to maintain your code don't you ?)

Unfortunately, while this technique is undoubtedly useful, you are unlikely to be able to create correct programs of more than perhaps 10-20 lines of code without systematic testing and a logical approach to tracking down and correcting the bugs in your programs.

Running the program interactively

You can be certain that if there are compilation errors which prevent it from running interactively that the program isn't going to run as a CGI. When you have got it to compile, using techniques which you have already learned from your development of interactive Python programs, you can then test interactively whether your program behaves as expected without any form or query string input. To make this possible, your program should be designed to at least not crash and create some output if it is run without any input as a CGI, even if this is just a suitable error message saying that the required input is missing, after a printed HTTP header.

This will only take you so far. If you are able to set environment variables on your system (e.g. as you did with the PYTHONPATH variable as covered in week 4) then you can set the QUERY_STRING environment variable in a similar manner and try providing some test input name=value pairs that would be read from the part of the URL after the question mark (?) character and still test the program interactively. Again, this technique can be useful, but you will still need to do fairly extensive testing using a live CGI environment to prove that the program meets its specification.

Investigating the web-server access and error logs

When a Python program raises an uncaught exception it will print a stack trace, stating the name of the exception and the source code line numbers in the various functions and modules where the error was detected. This information is printed out using the standard error file. For an interactive program the standard error file is sent, by default, to the same window or console on which the standard output, e.g. the normal output of print statements, is seen. However, this is not the default behaviour for CGI programs being run from a web server program such as Apache. In this situation the stack trace will be appended (printed onto the end of) the server error log files. You will also normally find an access log in the same folder as the error log, which can also give you some useful clues if you open it with a text veiwer, e.g. more on Unix or Notepad on Windows.

Where do you find the web-server log files ? This varies depending upon where the web server was installed and how it was configured. Places worth looking on a Unix or Linux server include /var/logs (on my system at home these are in /var/logs/httpd ) or /var/spool , /var/www , and /home/httpd . If you can find the httpd.conf file in the main Apache installation folder this is likely to specify the log files location. On the systems installed at the TIC using the procedure suggested in the exercises for week 8, these logs are held within a subfolder of your H:\Apache folder. In some cases you will not have access to your web server's logs. If so and you need to view the stack trace when something goes wrong with your CGI program running in its live environment, you have no option but to read on.

Redirecting standard error to go to standard output

The stderr object within the sys module is a reference to the standard error file, and this can be overwritten using a reference to the standard output object. If the program is not creating the expected output it is also useful to convert the output from text/html to text/plain, so whitespace and HTML tags are displayed as printed. I place the following 4 lines at the top of my Python CGI scripts, and if these don't create the required output I uncomment the relevant 3 lines and test the program with redirected error messages sent to the browser and with the entire CGI output as plain text so that I can see better what is going on.

# uncomment next 3 lines to debug script
# import sys
# sys.stderr=sys.stdout
# print "Content-type: text/plain\n"

Instead of commenting and uncommenting the above lines you might prefer to enclose these within an if block which tests a debug variable, which you could also test elsewhere within your script, to print out the values of variables and the fact that your program has got to various functions and milestones within the source code. You could then set the debug variable at the top of your script to 1 or 0, and start the script as follows:

debug=1
if debug:
  import sys
  sys.stderr=sys.stdout
  print "Content-type: text/plain\n"

You might be able to use other tactics to detect whether debugging was required, e.g. by reading a file or environment variable or query string. The disadvantage with these approaches is that more of your script (the parts which detect whether debugging is required or not) would have to be working before you can start debugging it any further. It is also possible to set a debug variable from a special wrapper script designed to call your CGI program, see the Ulf Göransson debug.py script, details below.

Catching an exception and printing the stack trace

Unfortunately redirecting the standard error output to go to the standard output file doesn't guarantee that the web server will not sever the connection to the browser when the program exits before the stack trace is printed to the browser. In order to access the error output reliably before the program exits it helps to put a wrapper around the program to catch the exception details. This is achieved by putting the entire CGI program (or everything after the uncommented debugging lines described above) into a function called main() and catching any exception occuring within main() function by calling it as follows:

if __name__ == "__main__":
  try:
    main()
  except:
    import traceback
    print "Content-type: text/plain\n"
    print "error detected in main()"
    traceback.print_exc()

The lines shown above test whether this Python source code is running as a main program, (the statement:

if __name__ == "__main__":

won't be true if the CGI is imported or reloaded as a module). If it is run, the main() function containing the rest of the program is wrapped within a try: except: block. If an error is detected here the program will attempt to import the traceback module and print out the stack trace using the print_exc() method before the program exits.

Using the CGI Traceback module (new in Python 2.2).

This is new in Python 2.2 and probably won't be available if you are using an earlier version. As I am currently [April 2002] using versions 2.0 and 2.1 on various computers I can't claim to have verified it, but it should work as documented. You should be able to get a traceback if this module is available to you. This will giving your CGI program a detailed error report including the values of your variables, assuming you can get the program to run at all. To use this place the following 2 lines at the top of your program

import cgitb
cgitb.enable()

If and when this approach works for you some of the other suggestions in this section will probably become outdated.

Other useful sources of Python CGI debugging information

Recommended reading includes the Python library documentation for the cgi and cgitb modules. www.python.org also carries the full library documentation for each version, if you are not using the latest release.

Ulf Göransson (UG) has written a useful Python CGI debug.py debugging script which you can use to set a debug variable within your own script and call it. debug.py will pass your script the various inputs and it will handle the stack trace. For further information see UG's HTML & Python CGI pages: CGI scripts