14. Modules

14.1. making a module of volume_sphere(): writing the function to go into a module

Let's say your function is one or both of

  • useful enough that you (or others) will want to use it and you are going to put it in some publically accessable place (your local network or post it to the internet).
  • part of a project and you're finished with it and don't want to think about it anymore and would like to put it in a place where python can pick it up at run time.

You make a module out of your code. From there, you can import the function into your other python code.

In its most basic form, a python module is just a file containing a function; making a module only a matter of copying the function into a file in a directory in PYTHONPATH (which includes your current directory). However you can do a much nicer job than that.

Before you unleash this code on the unsuspecting world: is it safe? Once people start copying it to their own machine, you've lost control of it and you can't fix it if it's broken. At that stage, you'll have to endure e-mails from irate users telling you that your function is crashing their machine, or their rockets are blowing up.

  • Consider all possible inputs: what do you want the function to do if the radius is -ve? You can't return 0; 0 is a valid volume for a sphere of radius 0. You could return an error condition, but we haven't done error handling in python yet. We could return a +ve volume, but that will cause problems for the calling routine, if for some mathematical reason we don't know about, a -ve radius is valid.

    Is it reasonable to return a -ve volume? The formula is valid for -ve radii: who are we to say that a -ve volume is invalid? If we take this point of view, what might be the consequences to the calling code of passing a -ve radius to volume_sphere.py? How would the calling routine get a -ve radius in the first place? A distance can be -ve, but the radius is the distance from the center to the surface: it's always +ve. We could take the point of view that if a -ve radius is an error, then the calling routine will trap such conditions and not call volume_sphere.py.

    These things need to be thought about and made explicit in the documentation. I think the cleanest thing to do in the the case of a -ve radius is to let the calling routine handle it. If the calling routing finds that the radius is -ve, and that this is an error condition, then it shouldn't ask for the volume of a sphere with -ve radius. As the function writer, we can't second guess the validity of a -ve radius; it may be valid.

  • Your function doesn't have to do everything. What it does, you must document; what it doesn't do, you must document. After that, it's up to the user not to get into trouble with it.

Before we let the code go, we want to to exercise all the features of volume_sphere.py (it can accept string, reals, +ve/-ve). Add these lines at the bottom of the code and rerun it to check your file.

#make a few calls to volume_sphere() to test it
print "the volume of a sphere of radius  1 is " + repr(volume_sphere(1))
print "the volume of a sphere of radius -1 is " + repr(volume_sphere("-1"))
print "the volume of a sphere of radius  4 is " + repr(volume_sphere("4"))

For the moment, we'll put the module in your class_files directory (so we don't have to change PYTHONPATH). Let's first make sure it can be called as a module. Make a copy of volume_sphere.py as volume.py. volume.py will be your module file (it will eventually hold all your functions that calculate volumes of solids; for the moment it only has the function volume_sphere). Then at the command line, do this:

# python -i volume.py
What is the radius of your sphere? 6
the volume of your sphere is 904.778684
the volume of a sphere of radius  1 is 4.1887902047863905
the volume of a sphere of radius -1 is -4.1887902047863905
the volume of a sphere of radius  4 is 268.08257310632899
>>> volume_sphere(0)	#run your own commands from the prompt
0.0

You told python to execute volume.py, which it did, asking you for a radius and returning a volume. Then python ran through the added tests and stopped (python had reached the end of volume.py) but python didn't exit; it stayed in immediate mode (the -i option; to see the difference run the same command without the -i option). Having loaded your file, python now knows about the new function volume_sphere() and you can run any extra tests e.g. volume_sphere(0).

Create and run this file (call_volume.py). It mimicks a piece of code that calls your module.

#! /usr/bin/python
from volume import volume_sphere

print
print "Testing code from call_volume.py"
print "the volume of a sphere of radius  2 is " + repr(volume_sphere(2))
print "the volume of a sphere of radius -2 is " + repr(volume_sphere("-2"))
print "the volume of a sphere of radius  0 is " + repr(volume_sphere("0"))

The new piece of code is the line

from volume import volume_sphere

which says "from the module file volume.py (or volume.pyc) import the function volume_sphere().

Note

The first time python imports your module file, it will make a bytecode version with a pyc extension, which it will subsequently use. Look for a new file volume.pyc in your class_files directory.

bytecode: a platform independant binary version of your .py file. It loads slightly faster, but runs at the same speed. You hand around the pyc file, if you want to make it difficult for the user to figure out the source code. (This is antithetical to the idea of GNU computing and the idea of the internet being a place to freely exchange information. A pyc file enables someone to make money by depriving you of information.)

How would you check that the py wasn't being used [84] ? How would you check that the pyc file was being used (in the absence of a py rather than the py file [85] ?

The output is

# ./call_volume.py
What is the radius of your sphere? 7
the volume of your sphere is 1436.755040
the volume of a sphere of radius  1 is 4.1887902047863905
the volume of a sphere of radius -1 is -4.1887902047863905
the volume of a sphere of radius  4 is 268.08257310632899

Testing code from call_volume.py
the volume of a sphere of radius  2 is 33.510321638291124
the volume of a sphere of radius -2 is -33.510321638291124
the volume of a sphere of radius  0 is 0.0

You asked python to load the function volume_sphere and then execute the code in the calling routine (print statements which call the function volume_sphere). You see the requested output in the 2nd block above (starting "Testing code ...").

The output starting "What is the..." is from code in the module file's global namespace. You did not ask python to execute this code, but it did anyway. Every module needs built-in testing code, but you don't want it executed every time you load the module. Handling this is the next step in building a module.

14.2. making a module of volume_sphere(): handling the global namespace code

In a normal language, code starts and ends execution in main(). Python is different: it starts executing with the first code it finds in global namespace. You would like your program to start execution in the same place, no matter what order you load your files, but python will start executing in a different place if you reorder your files. This is not what you want.

You could solve this problem by commenting out the global namespace code in your module, but then you'd have to write a separate file to test your module. The chances of keeping your module file and your module testing files together for decades is small. Your module has to be able to test itself and not rely on external code for testing.

The Python 2.5 Documentation (http://docs.python.org/download.html) in section 6.1 (More on Modules) says that the global namespace code in executed once on loading to allow initialisation of the function(s).

Note
compiled languages have a linker/loader to handle much of this
Note

What/why you'd do initialisation at load time (from a posting to the TriLUG mailing list - the poster didn't want credit, he said it was all common knowledge):

  • The most common case is a module will load more modules when loaded. That is, you import a module and the first lines in the module you load will be import statements which load yet more modules (and in turn, those modules may import still more modules).

    The module might selectively load certain other modules in a plug-in architecture based on things the module can introspect from the environment into which it was loaded.

  • You might want to cache some data from somewhere.
  • If you are doing metaprogramming in the module, you might want to create some new objects of the metatype you just loaded.
  • If you've defined some factory methods in the module, you might want to fire out some objects.
  • I think there's no end of uses for executing global namespace code when loading a module. I can think of no case where a module would be useful without executing code in its global namespace.
  • Basically, all the statements in a Python module are executed in order when loaded, whether as a script or as an import. The execution of most statements in a module, though, just define code objects (i.e., function objects, class objects, etc.) which will be used later (in the script, or by another script or module importing the module). So you might think of that as function and class objects getting "initialized" with their byte code.

There is a fix for code you don't want run from global namespace (it makes python execute in the same way that other languages execute their code).

First here's an normal module with its test code. You'll execute ./module.py.

#/usr/bin/python
#this file is called module.py
def function():
	#code for the function

#global namespace
#main() for this executable
#module initialisation code (if you need it)
#	eg if your function generated random numbers, 
#	you would seed (initialise) the random number generator (with say the date)
#	do that the random number generator would give different random numbers 
#	each time your ran the module

function(1)	#this is your testing code. You only want this to run when you execute the module by itself.

Python loads the function and then coming to the global namespace starts executing. What there is to execute is the line of test code function(1). You'll see the output from calling function(1) (whatever that may be).

output from function(1)

Next here's a bit of code that calls the function.

#!/usr/bin/python
#this file is called call_module.py
from module import function

#global namespace
#main() for this executable
function(2)

When you run the command call_module.py, function() is imported (loaded into memory, but not executed). As well (the bit you don't want) python starts executing in the first global namespace it finds, which is the line function(1) in the module. Next python will start executing in the global namespace for call_module.py, which is the call function(2). What you'll see will be

output from function(1) 	#you didn't ask for this (it's your testing code)
output from function(2)		#this is what you wanted

However the global namespace for module.py is not main() for call_module.py (the piece of code that's making all the calls). Once the code is in memory and just before it starts to run, here's what it looks like.

#module
def function():
	#code

#global namespace
#module initialisation code (if you need it)
#this is NOT main() anymore
function(1)
.
.
.

from module import function

#global namespace
#main() for this executable
function(2)

Since python runs global namespace code (whether it's in main() or not) whereever it finds it, you'll see the output from calling function(1) then function(2).

You apply the fix, a conditional, to the module file. It says "if this isn't main(), don't run it". Here's the fix.

#module
def function():
	#code

#global namespace
#module initialisation code (if you need it)
if __name__ == '__main__':
	#this is NOT main() 
	function(1)
.
.
.

from module import function

#global namespace
#main() for this executable
function(2)

Here's the output, with python now behaving like other languages.

#module initialisation code (if you need it)
output from function(2)		#this is what you wanted

The fix is a conditional statement

if __name__ == '__main__':

which says "If code execution starts here, then execute these indented statements". For the moment, you can regard syntax as magic (I do). The testing code is now governed by the conditional statement. If you execute the module by itself, the first global namespace code python will see will be in your module, it will also be main() for this executable, python will call this code "__main__" and the conditional will be executed. If code execution starts in global namespace code in some other file, (e.g. some routine calling volume_sphere.py) then the testing code in the module will not be called "__main__" and will not be run.

Here's the fixed version of volume.py containing the function volume_sphere.py.

#---------------
#! /usr/bin/python
#functions
def volume_sphere(r):
	"""name      - volume_sphere(r)
	   version   - 1.0 Feb 2008
	   method    - volume =(4.0/3.0)*pi*radius^3
	   parameter - radius as string, integer or real, valid all +/-ve numbers
	   returns   - volume of sphere, float
	   Author    - Homer Simpson, Homer@simpson.com (C) 2008
	   License   - GPL v3.
	"""

	import math     #import the whole math module (including lots of stuff you don't need)
	r=float(r)      #convert string input to float.
			#this allows the function to accept either a string or a number

			#pi in module math is called math.pi
	result=(4.0/3.0)*math.pi*r*r*r
	return result
#volume_sphere

#---------------
#main()

print "initialising volume.py"
print

if __name__ == '__main__':
	print "self tests for volume.py"
	print

	print "self tests for function volume_sphere()"

	##turn off the interactive test(s)
	##get input
	#resp = raw_input ("What is the radius of your sphere? ")
	##call the function, passing a string
	#volume=volume_sphere(resp)
	##formatted output
	#print "the volume of your sphere is %f" % volume

	#make a few calls to volume_sphere() to test it
	print "the volume of a sphere of radius  1 is " + repr(volume_sphere(1))
	print "the volume of a sphere of radius -1 is " + repr(volume_sphere("-1"))
	print "the volume of a sphere of radius  4 is " + repr(volume_sphere("4"))

# volume.py ---------------------

I've added code in the place where you would initialise the module on loading ("self tests...).

Here's the module being run in self testing mode (note a string indicating initialisation, a string indicating that file volume.py is being run, and a string indicating that tests are being run for module volume_sphere).

# ./volume.py
initialising volume.py

self tests for volume.py

self tests for volume_sphere()
What is the radius of your sphere? 7
the volume of your sphere is 1436.755040
the volume of a sphere of radius  1 is 4.1887902047863905
the volume of a sphere of radius -1 is -4.1887902047863905
the volume of a sphere of radius  4 is 268.08257310632899

Here's the module being run in interpreted mode. You can run your own tests at the end (here we outputted the volume for a sphere of radius "0").

# python -i ./volume.py
python -i ./volume.py
initialising volume.py

self tests for volume.py

self tests for function volume_sphere()
What is the radius of your sphere? 7
the volume of your sphere is 1436.755040
the volume of a sphere of radius  1 is 4.1887902047863905
the volume of a sphere of radius -1 is -4.1887902047863905
the volume of a sphere of radius  4 is 268.08257310632899
>>> volume_sphere("0")
0.0

Here's the module being called from another piece of code. Note that the initialisation code (before the "if" statement) is still run, but that now the self testing code is not run.

# ./call_volume.py 
initialising volume.py


Testing code from call_volume_sphere.py
the volume of a sphere of radius  2 is 33.510321638291124
the volume of a sphere of radius -2 is -33.510321638291124
the volume of a sphere of radius  0 is 0.0

Summary: make the module self testing. That way a developer, unsure of a problem they're having can quickly re-run self tests on suspect modules. (People do accidentally change the wrong files, or "upgrade" them.)

14.3. Code Maintenance

By now I hope you realise

  • Error messages are somewhere between wrong and right, and you can't tell which by looking at them.
  • It's hard to find your own bugs.

    Brian Kernighan: "Debugging is twice as hard as writing the code in the first place. Therefore, if you write the code as cleverly as possible, you are, by definition, not smart enough to debug it."

  • It's harder to read code than to write it (from Things you should never do, Part 1 http://www.joelonsoftware.com/articles/fog0000000069.html). This makes difficult the goal of software reuse.
  • It's hard to debug other people's code. The code I wrote for volume_sphere.py had intentional mistakes. Being new to python, you couldn't tell that the code was wrong. But even if you've been coding for years, any code that looks about right, is assumed to be right and you won't know there's a problem till you run it.

    Everyone's code is different. Educators wondering how they're going to detect if students have colluded in writing their homework assignments, are surprised to find that every piece of code is different. Why? Well there's only one way to code up this problem, right?

Much code is short programs that are used a few times and then thrown away. People don't document them well (or at all) and usually don't get into too much trouble for the lack of documentation.

Other code will be upgraded, or used unmodified for decades. The original author may be long gone, while the code is run hundreds of times a day, or once a year.

It turns out that the major cost in writing software, is not the initial cost of producing the first working version or even the production version, but maintaining the code (over decades). Someone who's never seen the code, doesn't know what it does or how it works, will be assigned the job of fixing/updating your code. It may take months before they can make the first changes. This may be longer than it took the original author to write the code. The prime aim of the coder should be to write code that a human can read.

Because code is unmaintainable, it's often simpler to write a new version from scratch. This is a terrible waste of resources. Managers, who often know nothing about writing code, are willing collaborators in this tradgedy. They will be sucked in by the arrival of a new language and say "this code was written in (some old language), it's time for us to be running (this year's fancy new language)". The effort to re-write the code from scratch in a new language is seen as a development cost. It's not, it's maintenance and is unneccessary maintenance at that. The old code was already debugged, an enormous investment; you can't throw away tested and debugged code. The computer doesn't care about the language the code was written in. Adding two numbers is the same to the computer, whether it was written in Fortran or Java. The new version will often be functionally worse than the original (and no more maintainable), but management will not realise this. To justify the money and time spent on the rewrite, management will tout the code rewritten in this year's new language, as an advance.

You code must be documented, and easy to read. Because reading other people's code is hard, anyone can write compact, fiendishlessly smart and cryptic code, that no-one else can read and that even the author won't understand 6 months later. Don't be too smart: make your code simple. Your first objective is to write maintainable code. Your prime objective in writing code, is not that the code works (which it must), or that it works correctly (which it must) or that it's fast (nice, but not required), but that it's maintainable.

Most GPL packages now have testing code which is run before installation.

Once your managers realise you have working code, they can claim their money from the customer, and they'll want you to move onto something else (which will make them money). You'll get little support for writing documentation and testing routines. When someone says to you "it works? then we don't need testing routines" you're talking to someone who doesn't know how to code and doesn't understand the enormous cost of maintaining bad code. You may have to do what they say, but you don't have to agree with them.

In the module volume, one of the functions does an import math to get the value of PI. If you have multiple functions each import'ing math, you could move this instruction to the module global namespace initialisation area, so math only has to be imported once. Your module would work fine, but you shouldn't do this. Why [86] ?

Note
End Lesson 12

14.4. Functions: recap

Functions are an essential part of a program. Functions look much the same in all languages and are invoked the same way (by passing parameters/arguments and accepting a returned variable). Most action in a program will be in functions. Any self contained logical block in a program is a candidate for being made into a function. Functions can be called from anywhere (i.e. from main() or from other functions).

The calling routine passes parameters; the called routine is passed parameters. The name of the parameters in the calling routine is usually different to those used in the called routine. The use of different names is for readability. The name of the parameter passed from the calling routine is usually specific to the calling routine. The name used for a parameter in the function is generic, a name that will work no matter what it's called by. If the same name is used for a parameter in the calling function and in the called function, then there are two variables, each in their own scope, with the same name: changes to the variable in the called function do not affect the variable of the same name in the calling function. The list of parameters is often called the interface - it's the exchange of information that must occur for a function to work.

#! /usr/bin/python

#my_program.py

#functions

def my_function(param1, param2..paramn):
	#calculate result
	return result

#main()
	#get data
	#do calculations
	#sample function call
	my_function(variable1, variable2..variablen)
	#output results

main() should be short, with as much work as possible handed off to functions. Functions can be called at any step in main(). Often the control of the flow of the program is complicated, in which case main() will be longer.

Code is turned into a function for

  • modularity - so a piece of code can be replaced by another piece, that behaves the same to the calling routine. Improvements in the program are made one function at a time. After the program is working, rarely is the interface (the parameter list) between a calling function and the called function changed.
  • readability - functions are small (5-50 lines approx). Their purpose and the implementation can be understood quickly and easily.
  • scope - variables declared inside a function cannot be seen by the calling function (or by most functions). Except for the parameters passed, and global variables, a function cannot access any variables in the program.

Functions must have documentation, enough for someone reading the docs to be able to recreate your code without having seen the code (for an example see documentation.

14.5. Function example: volume_hexagonal_prism()

Continuing our exploration of solid objects, write a module to calculate the volume of a hexagonal prism.

A prism Prism (Geometry) (http://en.wikipedia.org/wiki/Prism_%28geometry%29) is an n-sided polygon translated through space. A cube (and a rectangular block) is a prism, because (at least one) of its faces is the same as you slice off pieces parallel to the face. A cylinder is not a prism because the face is a circle, not a polygon. An ice crystal is a prism: it has a regular hexagon base and extends at right angles to the base. Many crystals are prisms.

Hexagonal ice prisms in thin layers of cirrus clouds (in NC seen mainly in winter) are responsible for spectacular ice rainbows and halos e.g. Frequent Halos (http://www.atoptics.co.uk/halo/common.htm), The Cloud Appreciation Society (http://cloudappreciationsociety.org/version2/wp-content/uploads/2008/02/08-feb-high1.jpg) and sundogs seen in a circle of radius 22° from the sun (see 22° halo http://en.wikipedia.org/wiki/22%C2%B0_halo).

The formula for the area of a regular hexagon (in algebra) from Areas and Perimeters of Regular Polygons (http://www.algebralab.org/lessons/lesson.aspx?file=Geometry_AreaPerimeterRegularPolygons.xml) and Area of a Regular Polygon (http://www.mathwords.com/a/area_regular_polygon.htm) is

Area = ((3*sqrt(3)/8)*d^2	#d=vertex to vertex (diameter of circumcircle)
Area = ((3*sqrt(3)/2)*r^2	#r=center to vertex (radius of circumcircle) 
Area = 2*sqrt(3)*apothem^2	#apotherm=cente to face (radius of inscribed circle)
Area = (2*sqrt(3)/4)*face-to-face-diam^2
Note
In python the square root can be calculated a couple of ways
>>> number=2
>>> number**(1.0/2.0)
1.4142135623730951

>>> number=9
>>> from math import sqrt
>>> sqrt(number)
3.0

Write a self contained file volume_hexagonal_prism.py (later you will then turn the file into a module and add it, along with self tests, into volume.py). Here are the specifications:

Note
The simplest thing to do would be to copy volume_sphere.py to volume_hexagonal_prism.py and start hacking on that code. You never write code from scratch if you have something similar, that's debugged and tested.
  • The main() code
    • asks the user for the vertex-to-vertex diameter of hexagonal base
    • asks for the height (or length, you choose what you call it) of the hexagonal prism
    • passes these two parameters to the function volume_hexagonal_prism(d,h)
    • prints the results using unformatted output in the following format
      The volume of a hexagonal prism of diameter xxxx and height xxxx is xxxxx
      
  • The function volume_hexagonal_prism(d,h) does the following
    • has comments and documentation (do that after you get it to work)
    • does any conversions on the input parameters neccessary for the function to use them
    • calculates the area of the base using the information above
    • calculates the volume using Volume=Area of base * height
    • returns the volume to the calling code

Check your output using a few test inputs. Make the test code appropriate for the volume_hexagonal_prism() case. Finish by commenting out the tests requiring user input and check that the non-interactive tests run OK. Check that the output is equivalent to running the volume_sphere() code in volume.py.

Remember the formula I gave you for the volume of a sphere that contained "(4/3)" which the computer evaluated as an integer, and the formula gave the wrong result? Always check your answer by hand. You might think that you could plug the numbers into a 4 function calculator and get the right answer and in this case you likely will. However one day you'll make a mistake without knowing it. The people who built the Hubble Space telescope, did a complicated test on the mirror surface and got the wrong answer. An amateur telescope maker using a knife edge and a flashlight (torch) bulb (the Foucault Test, http://en.wikipedia.org/wiki/Amateur_telescope_making#Foucault_test) would have detected the problem in a matter of minutes. The simplest test wins.

For a complicated shape like a hexagon, you have to think a bit to get a simple calculation that gives a good enough answer. First check the area of a hexagon. Here's two checks

  • Here is an ascii art illustration of a regular hexagon sitting inside a square (whose area you can calculate easily) (as depicted, the side of the hexagon at the top would not quite touch the box, but we just want an approximate area here).

     ___
    |/ \|
    |\_/|
    

    What should be the area of a regular hexagon of vertex-to-vertex diameter=1 (left-to-right distance) compared to the square of sides=1? Less than 1? Less than 1/2? More than 1 [87] ? Is the area more than 1/2? The object with half the area of the original square would be a square tilted at 45° to the original square and whose vertices are at the midpoints of each side of the original square (the two squares would look something like this).

     __
    |/\|
    |\/|
    

    The hexagon joins the square at the midpoints of the vertical sides of the square, but the hexagon's sides join the horizontal sides of the square (the top and bottom) outside the center of the horizontal sides. Is the hexagon bigger or smaller than the square of area 0.5 [88] ?

    You should expect the area of a hexagon to be less than the area of the outside (enclosing) square, but more than the area of the enclosed square (which has an area of half) i.e. 0.5<area hexagon<1.0.

  • The area of the circumcircle (the circle which touches all the vertices and with the same center as the hexagon) is what [89]? What can you can about the area of the hexagon touching the circumcircle [90] ?

You now know that the area of a hexagon of vertex-to-vertex diameter=1 is between 0.5 and 0.78.

Check the volume of a hexagonal prism with height=1. Then try combinations of height/diameter of 0,1. Check the change in volume if you double the height, double the diameter (by what ratio should the volume change for a doubling of the base diameter, a doubling in the height?).

Here's my code for volume_hexagonal_prism.py together with the output from three pairs of checks run from the command line. [91]

The print statement in main() does not have repr() for diameter and height, but does have it for volume. Why [92] ?

Note
End Lesson 13

14.6. making a module of volume_hexagonal_prism()

You're likely to write lots of functions. You put them in modules with similar functions. Here we're going to put the function volume_hexagonal_prism() into the module volume We've already put the function volume_sphere() into the module volume. Let's look at what happens when we add a 2nd function to the module.

The simplest scheme is to copy/paste volume_hexagonal_prism.py into volume.py this way.

Note
This code is called pseudo code - it's half English, half computer code; it's sufficiently generic that it could be implemented in any computer language.
shebang (if needed)
#scheme 1, for small modules

	"""
	collection of functions that calculate the volume of solids
	Authors attributed in each function.
	"""

volume_sphere()
	#code

volume_hexagonal_prism()
	#code

#----------
#main()

global namespace initialisation code

if __name__ == '__main__': 

	tests on volume_sphere		#multiple lines
	.
	.

	tests on volume_hexagonal_prism	#multiple lines
	.
	.

#--module volume

If you do something once, you can do it almost anyway you want. As soon as you do something twice, (like having two functions in a module) you have to consider what will happen if you do it 100 times. This is the problem of scaling mentioned in order of algorithm.

Inserting the functions is easy - you just stack them in order from the top of the file. It's what to do with the tests that's the problem. If you have 10 functions, each with 10 lines of tests (and comments) which you move into the module's global namespace (at the bottom of the file), you will have 100 lines of tests. This is too long a block of code to read and now the tests are a long way from the code they're testing. You'll have to do some thinking if you ever want to modify the file (like to turn off some of the tests).

On looking at your code with two functions, programmers will first ask "but does it scale?". Most often there aren't elegant solutions; mostly there are solutions that people will accept (or put up with). More often than you would like, the best solution anyone can think of is plain downright ugly (that's life).

You have some latitude on what to do here. Your main idea is for someone else to be glad to read your code.

One of the principles of code writing is that when a function (or main()) becomes too big, you split out a self contained logical block into a function. To do that here, put the tests for each function into their own test_function(), and call the test_function()s.

shebang (if needed)
#scheme 2, for medium sized modules

	"""
	collection of functions that calculate the volume of solids
	Authors attributed in each function.
	"""

volume_sphere()
	#code

test_volume_sphere()
	#tests

volume_hexagonal_prism()
	#code

test_volume_hexagonal_prism()
	#tests

#----------
#main()

global namespace initialisation code

if __name__ == '__main__': 

	test_volume_sphere()		#one line
	test_volume_hexagonal_prism()	#one line

This will work for 100 functions - you'd have 100 lines of test_function_name() at the bottom of the file. People would realise that this was a big module file and that the calls at the bottom are just a list of calls to test functions. Once the number of lines of calls in main() becomes unmanageable, you collect them into a module.

shebang (if needed)
#scheme 3, for huge modules

	"""
	collection of functions that calculate the volume of solids
	Authors attributed in each function.
	"""
volume_sphere()
	#code

test_volume_sphere()
	#tests

volume_hexagonal_prism()
	#code

test_volume_hexagonal_prism()
	#tests

run_all_tests():
	test_volume_sphere()		#one line
	test_volume_hexagonal_prism()	#one line
	.
	.

#----------
#main()

global namespace initialisation code

if __name__ == '__main__': 

	run_all_tests()

If you've got 100 functions, this 3rd way would be the best.

How do you know whether you have a small, medium or huge module? If you (or the users) can't understand the mess of code in main(); if they can't, you need to go to next scheme.

For the moment, let's make the module using Scheme 2.

Here's the specificiations

  • copy the function volume_hexagonal_prism() into volume.py, putting it below the function volume_sphere().
  • Make functions test_volume_sphere(), test_volume_hexagonal_sphere() and move the testing code from the two main()s into the appropriate test function.
  • Call the two test functions from code in main()
  • Set up the conditional "if __name__ == '__main__':" to only call the tests if volume.py is run as a standalone program (rather than loaded as a module by another file).

I've commented out the interactive tests, so the tests will run without keyboard input. Here's my version [93] and here's the output

# ./volume.py
initialising volume.py

self tests for volume.py

self tests for function volume_sphere()
the volume of a sphere of radius  1 is 4.1887902047863905
the volume of a sphere of radius -1 is -4.1887902047863905
the volume of a sphere of radius  4 is 268.08257310632899

self tests for function volume_hexagonal_prism()
the volume of your hexagonal_prism of diameter 1.0 and height 1.0 is 0.649519052838329
the volume of your hexagonal_prism of diameter 2.0 and height 1.0 is 2.598076211353316
the volume of your hexagonal_prism of diameter 1.0 and height 2.0 is 1.299038105676658
Note
End Lesson 14

14.7. Do the tests give the right answers?

A person installing your module and running the above tests has no idea if the output is correct. We have to compare the answer found with the expected answer, then output a pass/fail message. The user doesn't have to see the actual numbers (although they can).

Comparing reals is a problem (as we'll see in don't equate reals). The number on the screen (or in your code) may differ by a couple of low order bits from the number in memory. The number "0.1" will actually be represented by "0.10000000000000001". You have to compare the ratio or find the difference. The ratio (division) test will fail if the divisor is "0.0", so it's probably best to test the difference between the expected and found numbers. Reals in python are 64-bit giving a precision of 2-52. The numbers output by the test will either be as correct as the computer can make them, or will be obviously wrong.

Let's say to be safe that we'd like the difference between the numbers to be 2-50 (giving us 2 bits margin of error in detecting whether there is an error). What's 2-50 in decimal [94] ?

Note
You all know what 232 is [95] . Here's some more useful numbers
2^10  = (approx) 10^3   = 1k
2^20  = (approx) 10^6   = 1M
2^30  = (approx) 10^9   = 1G
2^40  = (approx) 10^12  = 1T
2^50  = (approx) 10^15  = 1P

2^-10 = (approx) 10^-3  = 1m
2^-20 = (approx) 10^-6  = 1u
2^-30 = (approx) 10^-9  = 1n
2^-40 = (approx) 10^-12 = 1p
2^-50 = (approx) 10^-15 = 1f
Hard disk manufacturers have been successfully sued for labelling disks as having (for example) 1TByte of storage, which people expect to mean 1099511627776 (1.1*1012) bytes, when they have only 1012 bytes (10%less). Attempts to diambiguate 1024 and 1000 as a kilobyte have lead to an alternate nomenclature, which is unfortunately is clumsy (Mebibyte http://en.wikipedia.org/wiki/Mebibyte).
Note
For the real way to compare numbers see Lahey Floating Point Arithmetic (see the section on "Safe Comparisons") (http://www.lahey.com/float.htm).

The test requires the numbers to be the same at an accuracy of 10-15, which in turn requires the output from your code to display 16 significant figures after the decimal point. If your output has only 12 significant figures (the default for the python print statement which uses str() which in turn prints 12 significant digits, see Floating Point Arithmetic, http://docs.python.org/tut/node16.html), the test will erroneously return fail. We will explore this more in Real Numbers. For the moment note

>>> j=0.12345678901234567890	#j has 20 digits after the decimal point
>>> print j
0.123456789012			#standard python printing has 12 digits
>>> j
0.12345678901234568		#64 bit real has about 16 significant digits
>>> print "%5.20f" %j		
0.12345678901234567737		#64-bit real printing 20 digits and showing garbage after 16 digits
>>> 

Check that your tests in volume.py give output with 16 figures after the decimal point before proceeding.

Start a file test_compare_results.py with a function compare_results() which has these specifications

  • takes two paremeters n_expected, n_found
  • compares the difference between these two numbers with the allowed error for a 64-bit real

    Note

    If your test is a < (less than) inequality test, and if the difference between the two numbers is +ve, you'll get a valid test. What will go wrong if the difference is -ve [96] ?

    Assume in one case that the two numbers are (1.0, 1.0000000000000001), and in another case are in the opposite order i.e. (1.0000000000000001, 1.0). How do you handle the case when the difference is -ve?

  • returns a string "pass" or "fail" for the comparison
  • test your function from main() with these calls
    print compare_results(1.0, 1.0000000000000001)
    print compare_results(1.0000000000000001, 1.0)
    print compare_results(1.0, 1.1)
    print compare_results(1.1, 1.0)
    

Here's my code [97] and here's my output [98] . Did one of the last pair pass? If so look at the note in the 2nd bullet above.

Why did I test the pairs of numbers, twice, the 2nd time in the reverse order [99] ? Why did the first pair pass, while the second pair failed [100] ?

The parameters passed to compare_results() are reals. How would you change the function to handle parameters than were string representations of numbers [101] ? (You don't need to handle this here. The numbers you're comparing are generated by the computer and will be numbers and not strings.)

Note
End Lesson 15

Shortly we will put compare_results() into volume.py, but first we can test it interactively (before we risk messing up volume.py by changing the code). Here's the interactive code

# python -i test_compare_results.py
pass
pass
fail
fail
>>> from volume import volume_sphere
initialising volume.py

>>> print "the volume of a sphere of radius  1 is " + repr(volume_sphere(1))
the volume of a sphere of radius  1 is 4.1887902047863905

In the interactive code we

  • ran test_compare_results.py:

    The python interpreter always runs the code in global namespace, which in this case has tests of the function compare_results(). Since test_compare_results.py is being run/executed, the python interpreter treats the global namespace as being main(). There are no tests in the global namespace to differentiate main() from the other code in global namespace, so all the code in global namespace is run.

  • loaded volume_sphere():

    Again, the python interpreter always runs the code in global namespace. What is the result of running this code [102] ? The global namespace in volume.py has code which runs tests on volume_sphere(), but this code is inside the conditional statement

    	if __name__ == '__main__':
    

    and it not run (why not [103] ?)

  • ran one of the tests built into test_volume_sphere()

We want to modify this test to compare the output with the expected output. At the end of this last line, add a call to compare_results() to compare the output with the expected output, so as to print a pass/fail [104] .

Notice how the symbol "1" and "4.1887902047863905" are used multiple times and/or they are the arguments to expressions? Constants (numbers that aren't ever going to be changed by the program) should be assigned to variables and the variables used in expressions. The reasons for this are

  • Constants determine the the behaviour of the code. They should be at the top, assigned to variables, with comments, so that someone doing maintenance knows what they're about and knows under what circumstances they can and cannot be changed.
  • A person reviewing the code has no idea whether a "1" here, is the same number as a "1" somewhere else. Is one of them my_variable or both? Is the "2" my_variable+1, or is it my_variable*2, or is it something else alltogether?
  • You can't put comments in an expression.

Assign the constants to variables (e.g. radius, expected_result) and use the new variable, rather than a number, in the expressions. Here's my code [105] .

You have two calls to (e.g. volume_sphere(radius)) in this print line. You never calculate anything twice: instead assign the result to a variable and use the value of the variable twice. Assign the value of the volume to volume_found and rerun the test. Here's my code [106]

Here's the final version, which you developed using python interactively

# python -i test_compare_results.py
python -i test_compare_results.py
pass
pass
fail
fail
>>> from volume import volume_sphere
initialising volume.py

>>> radius=1
>>> expected_result=4.1887902047863905
>>> volume_found=volume_sphere(radius)
>>> print "the volume of a sphere of radius " + repr(radius) + " is " + repr(volume_found) + " " + compare_results(expected_result, volume_found)
the volume of a sphere of radius 1 is 4.1887902047863905 pass
>>> 

volume.py consists of the functions we're really interested in, volume_sphere() and volume_hexagonal_prism(), plus a matching pair of functions, which test the functions with various test inputs. However as yet we don't have any way to test if the output is correct. You now are going to merge the code from test_compare_results.py and the interactive code above, into volume.py, so that you can compare the output of the tests with the known answer.

  • test_compare_results.py consists of a function compare_results() and some code in global namespace to test compare_results. What code from test_compare_results.py will be merged with volume.py and where will it go [107] ?
  • The interactive code above is a mock up of the changes that we're going to make to what part of volume.py[108] ?

Now do the merge

  • copy volume.py to volume_2.py
  • add/copy the function compare_results() from test_compare_results.py to volume_2.py
  • rewrite your tests, using the above template, to show pass/fail.
    Note
    Start by modifying test_volume_sphere(). You have 3 tests in each of the test functions: modify one of the tests and get it working first, then tackle the other 5 tests.

Here's my module volume_2.py [109] and here's the output [110] .

You are now safe to let modules out into the world. Congratulations.

Note
End Lesson 16

14.8. Code clean up

Note

I didn't use this material in class, as I couldn't see any teaching value in it. It's left here to show that sometimes there isn't a way to write good code.

The same line of code

print "the volume of a sphere of radius " + repr(radius) + \
	" is " + repr(volume_found) + " " + compare_results(expected_result,volume_found)

is used in multiple places in the tests, no matter what values are passed to the test. It's not to difficult for a reader to guess that the lines all might be the same (the text lines up from line to line), however it would take a bit of checking to be sure. If you wanted to change the line, you would have to do it multiple times (this is not good, from the point of view of maintenance, you might miss one line). It would be better to have only a single copy of the line of code. You could call it in a function, but a function with only one line of code, whose only purpose is to output a string for the user, is a bit pointless.

Instead I rewrote the code to run the print statement inside a loop, using the loop to feed in the parameters. After looking at the resulting code, I couldn't see that functionally is was any different to calling a function. Since the normal idiom is to call a function, anyone trying to maintain the code would have to puzzle as to why it was written in a loop.

I decided that for the number of times the tests would be run, that it really didn't matter if it was left the original way. Sometimes bad code is OK enough, or at least not worth the trouble of fixing.

We have several parameters (radius, expected_result) to feed to the line. However loops feed parameters one at a time e.g.

for item in items:
	#do something to item

Let's group the parameters into a list, which as far as a loop is concerned is a single object. Each iteration of the loop will feed a list to the loop code. Since we're doing multiple tests, we'll need to feed many lists, so let's have a list of lists.

Before doing anything with lists, look at this info about lists and list of lists.

Here parameter_list[] is a list of reals. loop_list[] is a list of parameter_list[]s (i.e. a list of lists). The code for using lists to feed parameters to our tests is

#construct lists from input data
loop_list = []	#initialise list

#first test
parameter_list = [ radius_1, expected_number_1, volume_1 ]
loop_list.append(parameter_list)		#add parameter_list to the tail of loop_list 

#second test
parameter_list = [ radius_2, expected_number_2, volume_2 ]
loop_list.append(parameter_list)		#add parameter_list to the tail of loop_list 

#we're reusing the variable parameter_list here. It's reinitialised in this statement
for parameter_list in loop_list:		#recover parameter_list one at a time
	volume = parameter_list.pop() 	#pop() removes from the tail, so last in is first out
	expected_number = parameter_list.pop() 
	radius = parameter_list.pop()
	print "the volume of a sphere of radius " + repr(radius) + " is " + repr(volume_sphere(radius)) + " " + compare_results(expected_result,volume_sphere(radius))

Modify your code to run your tests in a loop (copy volume_2.py to volume_3.py). Here's my code [111] .

To add (or modify) a test, it's just a matter of adding (or modifying) a paragraph like this

        diameter = 1.0
        height = 2.0
        expected_result=1.299038105676658
        volume=volume_hexagonal_prism(diameter, height)
        parameter_list = [diameter, height, expected_result, volume ]
        loop_list.append(parameter_list)
Note
I wouldn't say that this version of the tests with a for loop is particularly more readable or modifiable than the previous version. However it's an alternate way of handling the scalability problem.

14.9. Train Wreck code

Code in compare_results() looked a little like the following. Notice any difference in readability between these two functionally identical pieces of code?

expected_result=1.0
volume=call_to_function(param1,param2...paramn)
compare_results(expected_result, volume)
expected_result=1.0
compare_results(expected_result,call_to_function(param1,param2...paramn))

The second line of the last piece of code is called a train wreck. It's not too big a train wreck as far as train wrecks go, but it's still a train wreck.

It's very easy, once you've set up all your functions, to have lines of code like this, that have calls to functions to any depth you can imagine (the params themselves could be calls to functions). The code is compact and demonstrates recursive parsing of code.

Note
Recursive parsing is the ability to recognise an expression, which will evaluate to a number, as being equivalent to a number. Early languages (e.g. Fortran) could not do recursive parsing. If instead of a number, an expression which evaluated to a number was found, the compiler/interpreter would crash. This was really stone age computing.

Compact code makes a section of code look neater (there's less lines of code). This is all very nice, except that code like this is hard to read and no-one (including you in 6 months), will have a clue what it's doing. The next person won't look forward to fixing it if it stops working.