Contents
This programming problem really is nasty. Programming is conceptually one of the most difficult jobs human beings do. Something is needed to make this difficulty manageable. Functions and modules are at the heart of strategies and tactics which achieve this.
The Human brain is capable of holding between 5 and 10 facts, concepts or ideas in short term memory at the same time. Without tools to help us design programs in a modular manner, one component at a time, the complexity of this task will very soon overwhelm even the best programmer. When programs are designed using smaller components the complexity is contained within neat functions and packages. Many of the objects which belong inside a function or package don't need to be visible outside.
This means that a programmer can write a program unit which makes use of other components without needing to know or think very much about them. Did you know, or need to know or care how the print or raw_input functions worked internally in the Python programs you have written ?
At the simplest level a function just does a job and quits. This allows a program to be split up to the limited extent of which parts execute when.
def croak(): print "groan, ", print "winge, ", print "This,", croak() print "that."
A function is an indented block of code following a def statement. The word following def is the function name. The function is called by stating its name followed by () brackets, including parameters if any. This program outputs:
This, groan, winge, that.
The order of statement execution is clear from the output.
If function identification and definition were a trivial task, designing programs would be very easy. It isn't so programming experience counts. The only way to get experience is to read and write and run a lot of source code.
The approaches adopted by experienced programmers include:
a. Split the program actions into a number of well defined tasks.
Wherever you are tempted to copy and paste code from one place in a program to another, could your cut and pasted code usefully go into its own function ? If you are only copying a call to an existing program unit e.g. print, the answer is probably not. If you can put a simple name and definition to what the cut and pasted code does the answer is probably yes.
b. Define functions to minimise communication between the function and the rest of the program, and to maximise containment of data and actions within the function. Designers of electrical and mechanical components also decide interfaces like this.
c. Where possible try to limit communication between functions and their environment to well-defined interfaces, and reduce possible side effects. Side effects can occur through the use of global variables and by performing input and output within the function. Clarifications of this rule are in rules d. and e.
d. Avoid using global data unless doing this requires you to have many functions with similar parameters and return values instead.
E.G. if your program handles a database which is accessed in most or all functions (which mainly exist to access it) then you may as well make the database accessible through a global variable in preference to having more parameters.
e. If a function sensibly does some input or output work, then make it only access one file (or possibly a set of similar files in a similar manner).
E.G. i. If a function calculates a square root this function becomes more flexible and self-contained if it returns the square root to the calling program unit than if it prints it on the console.
E.G. ii. In line with objective a. it makes sense to have separate functions for reading and writing external files especially if data is parsed on input or output.
When a function is defined it can be given any types of data object through named parameters.
def square(x): # x is a named parameter print x*x
This function definition prints the square of the object locally named x. If called with a parameter of 3, the local reference x will refer to the object 3 at the time the function is called and run:
>>> def square(x): # definition ... print x*x # function body ... >>> square(3) # function call 9
We can just as easily call square() with a floating point parameter:
>>> square(2.5) 6.25
If we specify more than 1 parameter in a comma separated positional parameter list the position of parameters will be the same in both definition and call:
>>> def divide(x,y): # definition ... print x/y ... >>> divide(27,9) # call 3
Parameters can also be supplied through name=value pairs. These allows complex function parameters to be specified by the programmer in any order. Name=value pairs can also be given default values by the function defining programmer, so that when a function using programmer doesn't need to know about a parameter he or she can ignore it and still use the function, but in a less configured manner. Consider the following program:
import sys from Tkinter import * widget = Button(None,text="push me!",command=sys.exit) widget.pack() widget.mainloop()
When run this creates a window with conventional controls to minimise, resize and exit, and a button with the legend: push me! . Clicking on the button causes the application to exit. Removing the text="push me!" parameter from the Button() call:
widget = Button(None,command=sys.exit)
This program still runs, but without any button text.
Any object can be returned using a return statement. A function will stop executing when it executes a return statement.
>>> def total(list): ... tot=0 ... for item in list: ... tot=tot+item ... return tot ... >>> total([2,4,3]) 9
The returned value can be used directly as above, or assigned to a reference for use later on:
>>> sum=total((3,4)) >>> print sum 7
We have seen integer and floating point parameters interchanged if the operations performed on them by the function make sense. The same applies to sequence parameters such as lists and tuples.
References which already exist at the module level (i.e. not inside a def or class indented block) of the enclosing source file are considered global so they can be read directly by functions:
>>> def readonly(): ... print a # read access ... >>> a=5 # a is global >>> readonly() 5
Writing (assigning) to a reference within a function is another matter:
>>> def writelocal(): ... b=3 # write access to b so this b is local to writelocal() ... print b ... >>> b=5 >>> writelocal() 3 >>> print b 5
This assigns to a local variable called b, not the global variable of the same name which retains its original value. However, if you assign to a global list member within a function you change the global list:
>>> c=[2,4,6] >>> def listmember(): ... c[1]=5 ... >>> print c [2, 4, 6] # gets original value, have not called listmember yet >>> listmember() >>> print c [2, 5, 6]
Assigning an entirely different list to reference c within a function would change what the local reference c refers to, but the original list would still be available globally.
>>> print c [2, 5, 6] >>> def newlist(): ... c=[1,2,3] # defines a new list for local reference c ... >>> newlist() >>> c [2, 5, 6]
If we want to override a global name within a local scope we can apply the global keyword:
>>> c [2, 5, 6] >>> def globalref(): ... global c # going to mess with name c at module level ... c=[1,2,3] ... >>> globalref() >>> c [1, 2, 3]
Python's built in names
We've already encountered a few, such as print, len() and range(). To use these we didn't have to import any modules containing them; they are part of the Python core language. Computer scientists describe one of the significant features of object oriented programming as "polymorphism", or something taking many forms.
However, if you don't know why you want to override built in names like len, or built in operators like + then don't. To get a list of names to leave well alone (or mess around with) call the dir() function to look at the built-in list named __builtins__ . The dir() function can get you all the names from any module.
>>> dir(__builtins__) ['ArithmeticError', 'AssertionError', 'AttributeError', 'EOFError', 'Ellipsis', 'EnvironmentError', 'Exception', 'FloatingPointError', 'IOError', 'ImportError', 'IndentationError', 'IndexError', 'KeyError', 'KeyboardInterrupt', 'LookupError', 'MemoryError', 'NameError', 'None', 'NotImplementedError', 'OSError', 'OverflowError', 'RuntimeError', 'StandardError', 'SyntaxError', 'SystemError', 'SystemExit', 'TabError', 'TypeError', 'UnboundLocalError', 'UnicodeError', 'ValueError', 'ZeroDivisionError', '_', '__debug__', '__doc__', '__import__', '__name__', 'abs', 'apply', 'buffer', 'callable', 'chr', 'cmp', 'coerce', 'compile', 'complex', 'copyright', 'credits', 'delattr', 'dir', 'divmod', 'eval', 'execfile', 'exit', 'filter', 'float', 'getattr', 'globals', 'hasattr', 'hash', 'hex', 'id', 'input', 'int', 'intern', 'isinstance', 'issubclass', 'len', 'license', 'list', 'locals', 'long', 'map', 'max', 'min', 'oct', 'open', 'ord', 'pow', 'quit', 'range', 'raw_input', 'reduce', 'reload', 'repr', 'round', 'setattr', 'slice', 'str', 'tuple', 'type', 'unichr', 'unicode', 'vars', 'xrange', 'zip']
If this output looks suspiciously like a list your suspicions are well founded.
At a certain stage we learned that boxed games were more usable and interesting if we don't empty them all on the floor at the same time. Children discover this sooner with games which have many similar parts such as jigsaws. Sometimes we want to access parts from more than one game at a time. E.G. we might want to use a dice from one game to play another. When we do this it's a good idea to remember which box it came from though.
If we want all names from a module imported into ours we can empty it on the floor:
from Tkinter import *
But if you empty names from too many modules at once into yours you might find it more difficult figuring out which name does what or where it really belongs. If you say instead:
import math print math.pi # you remembered which box pi came in
Having functions allows the same code to be accessed from different parts of the same program without needing local copies of the code. This solves many problems. However, when programmers work on multiple programs (as they do) this naturally leads to another kind of mess. Instead of filing generally useful related functions and data into modules which can be shared between programs, inexperienced programmers tend to cut and paste functions and data between programs. This approach may work with throw-away code, e.g. intended to solve one-off data conversions etc. Unfortunately, in other situations this inevitably attracts a horrible exploding bug swarm, where cut and pasted code means cut and pasted bugs.
In mechanical engineering terms this would be like a car designer needing new starter motor and battery designs for every car, instead of using a standard range of components across an entire model range.
Enough of the theory. The following module has been saved as stats.py :
# stats.py a simple statistics module
def total(sequence): """ returns total of items in sequence parameter """ total=0 for item in sequence: total+=item # add item to total return total def average(sequence): """ returns average of items in sequence parameter """ n_items=len(sequence) if n_items: # false if empty, avoids /0 tot=total(sequence) # calls the total() function in same module return tot/n_items else: return None # The Python NULL object # and just for fun, a reference to some data the_meaning_of_life_the_universe_and_everything=42
These objects can then be imported and used in other programs, or tested on the interpreter command line:
>>> import stats >>> a=[2,5,8] >>> stats.total(a) # note use of qualified object name 15 >>> stats.average(a) 5
The data reference is also used by qualifying it with the module it comes from:
>>> print stats.the_meaning_of_life_the_universe_and_everything 42
Python bytecode compilation
After the stats module had been imported, another file called stats.pyc appeared in the same directory as stats.py . Python creates a compiled bytecode version from the module source code once they are imported to save this part of the compile-interpret-run cycle being repeated needlessly. Python also detects module source updates and recompiles the .pyc file if needed.
We can check the names available within a module using the dir() built in:
>>> dir(stats) ['__builtins__', '__doc__', '__file__', '__name__', 'average', 'the_meaning_of_life_the_universe_and_everything', 'total']
The names in the list e.g. __doc__ which start and end with two underscores are used internally by the Python system.
Using what is already available in free-software libraries is where the software reuse engineering philosophy starts paying off. What can now be achieved with a few lines of our own code didn't happen by accident, or just because some engineers were altruistic. For a programmer who doesn't sell proprietary software packages (i.e. about 95% of us), nothing is lost by sharing code you develop yourself. Everything is gained when other engineers who use your code send you improvements.
We had to import the math and file library modules to use them, just as we had to import our own stats module. Some library modules are written in 'C' or 'C++' and are compiled into the Python interpreter. If you carry out a search of the Python modules, you won't find a math.pyc or a math.py because Python is not as fast as 'C' for this kind of job.
The availability of Python source code for much of the library helps us dig deeper and learn more. For example, here is part of the bisect.py module from the Python library, which uses a binary split search to perform an insertion sort:
def insort(a, x, lo=0, hi=None): """Insert item x in list a, and keep it sorted assuming a is sorted.""" if hi is None: hi = len(a) while lo < hi: mid = (lo+hi)/2 if x < a[mid]: hi = mid else: lo = mid+1 a.insert(lo, x)
The last statement was of interest. It shows Python lists have a method which inserts an item into the middle of the list, which I previously handled using slice assignments. Let's check this out:
>>> a=[2,4,6] >>> dir(a) # does a list type have an insert() method ? ['append', 'count', 'extend', 'index', 'insert', 'pop', 'remove', 'reverse', 'sort'] >>> a.insert(1,9) # It does, so try it >>> a [2, 9, 4, 6] # insert worked as intended
Looking at the names supported by a list object shows there is an insert object, which will probably do what we saw in bisect.py . So trying it by passing the index before the insert position and the object to be inserted as parameters: a.insert(1,9) we succeed in inserting the object 9 at index 1, moving higher items from 1 up by one.
Having learned how to use other people's modules and create our own we now want to be able to store them in places on the system where they are accessible from any Python project directory or program. We don't want to have to copy modules.
The solution to this requirement comes in the form of an environment variable.
Linux/Unix
On Linux/Unix environment variables names can be echoed e.g:
$ echo $PYTHONPATH PYTHONPATH=/usr/lib/python2.0:/usr/local/lib/python
This shows the current PYTHONPATH value has 2 directories, /usr/lib/python2.0 and /usr/local/lib/python . These 2 folders are seperated by the colon (:).
Setting this environment variable on Linux is achieved by adding the following 2 lines to .bash_profile :
PYTHONPATH=/usr/lib/python2.0:/usr/local/lib/python export PYTHONPATH
If you use the old Bourne Shell you could add these lines to .profile instead. If you use the 'C' shell) on Unix put the following command into .login in the users home directory:
setenv PYTHONPATH /usr/lib/python2.0:/usr/local/lib/python
Microsoft Windows
On older versions of Windows (95/98) these can be set adding the MS-DOS shell command into c:\autoexec.bat :
set PYTHONPATH=C:\Python21;C:\python
Note the use of semicolon (;) to delimit the 2 folders on the path.
On Windows 2000 you can set environment variables using a dialog accessed from the control panel.