Thursday, January 19, 2012

Nested Dictionaries

There are times when you do several indirections in a dictionary. That is, the key is also a dictionary with a key which also a dictionary like this:

>>> c['x'] = 'y'
>>> b['y'] = 'z'
>>>  a['z'] = [1, 2, 3]
>>> a[b[c['x']]]
[1, 2, 3]

However, you don't want it throwing exception when the innermost key do not exists. In the example above, the innermost key is 'x' in c['x'], and instead of throwing an exception, you want it to return an empty list. This is what you do using 'get' method in dict:

>>> r = a.get(b.get(c.get('x','nothing here'),'nothing here'),[])
>>> r
[1, 2, 3]
>>> r = a.get(b.get(c.get('xx','nothing here'),'nothing here'),[])
>>> r
[]

'nothing here' should not match any keys for this to work. Starting at the innermost dict, 'xx' key does not exist in c, so, it returns 'nothing here' which also do not exist in b and therefore returned also 'nothing here', and finally, since 'nothing here' key does not exist in a, it returned an empty list.

A similar scenario might happen for a dict of a dict of a dict like this:
>>> u = {}
>>> u['xxx'] = {}
>>> u['xxx']['yyy'] = {}
>>> u['xxx']['yyy']['zzz'] = [2, 3,4]

Instead of throwing an exception if no such key exist in this dict, you just want it to return an empty list. Here is the way to do it.

>>> u.get('xxx',{}).get('yyy',{}).get('zzz',[])
[2, 3, 4]
>>> u.get('xxx',{}).get('yyy',{}).get('ooo',[])
[]
>>> u.get('xxx',{}).get('ooo',{}).get('zzz',[])
[]
>>> u.get('ooo',{}).get('yyy',{}).get('zzz',[])
[]
>>> u.get('ooo',{}).get('ooo',{}).get('zzz',[])
[]
>>> u.get('ooo',{}).get('ooo',{}).get('ooo',[])
[]
Ultimately, the default value at the last 'get' is the one that will be returned when any of the 3 keys do not exists in the dict. The first instance of get must return an empty dict because the 'get' method will be applied against it. Try using an empty list on that 'get' to see what I mean.

Thursday, May 19, 2011

Creating Combinations

You should have python2.6 installed for this to work.

>>> import itertools
>>> lx = ['none','P1','P2','P3']
>>> lx
['none', 'P1', 'P2', 'P3']
>>> l2 = itertools.combinations(lx,2) # combination of 2 objects
>>> for x in l2:
...     if 'P1' in x:
...             p1.append(list(x))
...
>>> p1
[['none', 'P1'], ['P1', 'P2'], ['P1', 'P3']]

The above example created a list that contains 2 combinations that has 'P1' in it.

Sunday, February 13, 2011

Regular Expression

Regular expression is pattern matching and optionally, extracting data from those that matched. It is like 'grep' or 'findstr' command, only better.

For example, a station name might be an 'A', followed by numbers, followed by 'S'. Below are examples of station name.

A50S
A234S
A3S
A5425S


You might want to know whether a line of text contain a station name in it. Optionally, you might want to pull, say, '50' out of 'A50S'. Regular expression or regex is your tool. Below are the most common regex I use. For more complete discussion, read this.

Use the symbol ------------ To match
  below
\d                                  any decimal digit
\D                                 any non-digit
\s                                  any white space character like space and tab
\S                                 any non-white space character
[a-z,A-Z]                       any alphabet letter
[a, b, c]                        either 'a' or 'b' or 'c'
.                                    any character except newline

The pattern above would only match one item. For example, \d would only match 1 digit. If you want to match 2 digits, you need to place \d\d. If there could be 2 or 3 or 4 or more digits but you don't know on what particular instance it would be, this approach may not work. The way to approach that is to use the following 'special characters'

Use this                    ---------   If you want to match, with respect
Special Character                to the previous symbol (see above)
*                                           zero or more instance of same thing
+                                          one or more instance of same thing

So, if you say '\d*', it would match even if there is no digit. '\d+' would match if there is at least one digit.

Now, the regex formed above would match anywhere in the line. There are times when you want a match only if it is at the beginning or end. To accomplish this, use the following 'anchors' at the beginning (use ^) or end (use $) of your regex

^regex  - if you want a match only at the very beginning of the line
regex$  - If you want it at the end only

The example below should make it clear.

You want to find a match for station name which could either be:


A50S
A234S
A3S
A5425S

The pattern that you could see for station name is.

'A' followed by a number, followed by 'S'  . The regex is
 ^             ^                                           ^
  |  ________|                                            |
  |  |                                                         |

'A\d+S' ---------------------------------------------------

You would recall that '\d' is a decimal digit and so, one or more of that (the '+' symbol), meaning, one or more decimal digit, would catch any digits following the first digit. Had you used '\d*' instead, it would match even though there is no digit, like 'AS' because zero digit is allowed. Having formed the regex, you could use it as follows.

>>> import re
>>> pattern = re.compile('A\d+S')
>>> s = 'A50S'
>>> if pattern.search(s):
...     print 'got a match'
... else:
...     print 'no match'
...
got a match

First you import the regex module (re), then compile the regex into a pattern, then use the 'search' method of the pattern with the string to be searched as argument. This 'search' method will return nonzero value if it found a match and none otherwise.

>>> s = 'I went inside A698S and shouted'
>>> if pattern.search(s):
...     print 'got a match'
... else:
...     print 'no match'
...
got a match

It still got a match even though the station name is in the middle of the string.

>>> s = 'AB453S'
>>> if pattern.search(s):
...     print 'got a match'
... else:
...     print 'no match'
...
no match


It did not match this time because after 'A', you got 'B' which is not a digit.


So, if you got a file 'test.txt' which contains the following:


A432S is the first station - a station
Next to that is A53L - not a station
Third is A432222S which is large! - a station
Fourth is A5422CS - not a station


And you want to print those lines which contain names of station. Then the script reg.py below could do the trick.


import re

pattern = re.compile('A\d+S')
fin = open('test.txt', 'r')
for line in fin:
    if pattern.search(line):
        print line
fin.close()


Now, let me run this script...

[postgres@postgres-desktop:~]$ python reg.py                      (02-07 00:09)
A432S is the first station - a station

Third is A432222S which is large! - a station

If you want a match only if station name is at the beginning of the line, your regex would be:

'^A\d+S'

You need to 'compile' this again and assign to a pattern variable before doing a 'search'. Here is another example of regex expression.

To match:

xa aDfd455ddD
xa XdG4442sDt
xa cCC123Xt

The pattern here is 'xa' followed by space, followed by group of letters, followed by digits, followed by letters. Regex is

'xa [a-z,A-Z]+\d+[a-z,A-Z]+'

Again, keep in mind that the '+' sign means 'one or more instance of the same thing' and not concatenation of regular expression.

----------------------------------------------------------------------
EXTRACTING DATA USING REGEX

From our station example at the beginning, suppose, say for a station name of A42S, you want to extract the numerical part at the center ('42' in this station), how do you do it? Well, all you need to do is to enclose the regex that you want to extract inside a pair of parentheses and use the 'group' command.

The regex for station name is: 'A\d+S'

You want the digit portion which is '\d+'. So, enclose this by parentheses and assign it to a new pattern:

>>> pattern1 = re.compile('A(\d+)S')

Then, do a search followed by 'group'

>>> s = 'I am going to A4326S today'
>>> m = pattern1.search(s)
>>> m.group(1)
'4326'

Now, what if there are parentheses are part of the string to be searched? All you need to do is 'escape' it using '\'.

For example, you might be trying to match 'origin(23 45)'. Your regex would be:

'origin\(\d+ \d+\)'

If '23' and '45' are x and y coordinates, respectively and you want to extract them, it will be as follows. Enclose the digits that you want to extract with parentheses, do a search, followed by group:

 >>> pattern2 = re.compile('origin\((\d+) (\d+)\)')
>>> s = 'display picture origin(23 45)'
>>> m = pattern2.search(s)
>>> x, y = m.group(1,2)
>>> x
'23'
>>> y
'45'

Wednesday, January 19, 2011

Functions

Functions must be defined (or coded earlier in the script) before they can be called as shown below:

[postgres@postgres-desktop:~]$ python myfunction.py               (01-18 00:06)
Traceback (most recent call last):
  File "myfunction.py", line 1, in <module>
    a = square(10)
NameError: name 'square' is not defined




Below is a listing of myfunction.py
 

a = square(10)
print a

def square(x):
    return x*x

You use the keyword "def" to define a function. Just like in C, the function arguments goes inside the parentheses and are parsed in the same order that they are passed. You then terminate the "def" statement with a colon. You then indent the contents of the function. A function may, or may not return a value, pretty much like in C.

The reason why an exception was thrown when the code was run is that the function was defined after it was called and hence, the interpreter is not yet aware of its existence when the call was made. Remember, the way python works is that it reads the py file in sequence, starting at the beginning, interpreting it, and then executing the interpreted line.

To correct this, place the function definition ahead of its call like this:

def square(x):
    return x*x

a = square(10)
print a

When you run it, the result is as expected:

[postgres@postgres-desktop:~]$ python myfunction.py               (01-18 00:18)
100

Python knew the end of function definition when the indentation ends. One thing with python is that you could mix the function definition with the the main body and still works fine. In C, you can't do this. However, this could easily make yourself confused. So, avoid it. Below is an example:

# my function.py

y = 10

def square(x):
    return x*x

a = square(y)
print a

Below is the result when you run it:

[postgres@postgres-desktop:~]$ python myfunction.py               (01-18 00:23)
100

As you could see, it still runs without error.

You could define an empty function definition like this:

def square(x):
    pass

which is equivalent to this C function

float square(float x)
{
}


DEFAULT VALUES

It is very easy to set default values in function definition. To illustrate:

>>> def whoAmI(name, age, sex='male'):
...     print 'I am ' + name + ', ' + str(age) + ' yrs old, ' + sex
...
>>> whoAmI('John', 33)
I am John, 33 yrs old, male
>>> whoAmI('Ann', 24, 'female')
I am Ann, 24 yrs old, female


In the first function call, only the name and age were passed. Since nothing was supplied for sex (i.e., only 2 arguments were supplied), it used the default value of sex, which is "male".


In the second call, all 3 arguments were supplied and the supplied value for sex was used.


Remember,  the order by which the arguments were passed, will be the order they will be assigned on the variables at the receiving function.

Another thing to remember for assigning default values is that they should be at the end and not to be followed by non-default argument as shown below:

>>> def whoAmI(name, sex='male', age):
...     print 'I am ' + name + ', ' + str(age) + ' yrs old, ' + sex
...
  File "<stdin>", line 1
SyntaxError: non-default argument follows default argument




NAMED ARGUMENTS

Use of named arguments is illustrated below:

>>> def whoAmI(name, age, sex):
...     print 'I am ' + name + ', ' + str(age) + ' yrs old, ' + sex
...
>>> whoAmI(age=20, sex='female', name='Mary')
I am Mary, 20 yrs old, female

As you could see, using named argument when you make a function call enables you to pass the arguments in any order you want and still get the desired result. The added advantages are: 1.  It makes your code clearer because it reminds you which values goes to which variables when you make the call, and 2. It is easier to handle default values as illustrated below:

 >>> def whoAmI(name, sex='male', age=25):
...     print 'I am ' + name + ', ' + str(age) + ' yrs old, ' + sex
...
>>> whoAmI('Kim', age=44)
I am Kim, 44 yrs old, male
>>> whoAmI(sex='female', name='Mary')
I am Mary, 25 yrs old, female

 VARIABLE NUMBER OF ARGUMENTS

Remember the printf function in C? You pass any number of arguments and it works fine. Here is how you do it in python:

>>> def f(*v):
...     for u in v:
...         print u
...
>>> f(10, 20, 30)
10
20
30
>>> f(1234)
1234
>>> f('are','you','human',12.5)
are
you
human
12.5

Notice the asterisk (*) before the variable 'v' in the function argument. It is like saying that this variable will be a list to hold all parameters that were passed. You access the function arguments by iterating this 'list' variable.

You could actually combine the positional arguments with variable arguments like this:

>>> def g(positional, *variable):
...     print 'positional argument is: ' + str(positional)
...     print 'variable arguments are: ' + str(variable)
...
>>> g(10, 20, 30, 40)
positional argument is: 10
variable arguments are: (20, 30, 40)

Here is another example:

 >>> def h(a, b, c, *d):
...     print 'a=' + str(a)
...     print 'b=' + str(b)
...     print 'c=' + str(c)
...     print 'variable argument d=' + str(d)
...
>>> h(11, 22, 33, 44, 55, 66, 77, 88)
a=11
b=22
c=33
variable argument d=(44, 55, 66, 77, 88)


Positional arguments should come first before variable arguments or there will be an exception.

KEYWORD ARGUMENTS

Here is how you handle a variable number of keyword arguments - you place double asterisk before the variable name on the receiving function and then, the keyword arguments are converted into a dictionary. The variable names that were passed were converted into a string.

>>> def f(**k):
...     print k
...
>>> f(a='apple', b='banana')
{'a': 'apple', 'b': 'banana'}
>>> f(x=1, y='yarn',z=2)
{'y': 'yarn', 'x': 1, 'z': 2}



GENERAL FORMAT OF FUNCTION ARGUMENTS



The general format is that you have the positional arguments first, followed by variable arguments, then finally, the variable keyword arguments. Here is an example:



>>> def g(a,b, *args, **kwargs):
...     print 'positional arguments are: a=' + str(a) + ', b=' + str(b)
...     print 'variable arguments args=' + str(args)
...     print 'keyword arguments kwargs=' + str(kwargs)
...
>>> g(10, 20, 30, 40, 50, 60, m='model', c='cat', x=100)
positional arguments are: a=10, b=20
variable arguments args=(30, 40, 50, 60)
keyword arguments kwargs={'x': 100, 'c': 'cat', 'm': 'model'}



FUNCTIONS AS FIRST CLASS OBJECTS IN PYTHON

In python, functions are treated just like strings or integers in the sense that you could pass them around as if they were data. The examples below illustrates the important points:

>>> def add(x,y):
...     print 'inside add function'
...     return x+y
...
>>> def sub(x,y):
...     print 'inside sub function'
...     return x-y
...
>>> def multiply(x,y):
...     print 'inside multiply function'
...     return x*y
...
>>> z = add  # you could assign the function to a variable
>>> z(3,4)   # and then invoke it through that variable
inside add function
7

>>> def doThis(func,x,y): # invokes the function passed to this function
...     print 'inside doThis function, about to call another function'
...     return func(x,y)
...
>>> doThis(sub,7,5)
inside doThis function, about to call another function
inside sub function
2

>>> doThis(multiply,4,5)
inside doThis function, about to call another function
inside multiply function
20



You could also create a dictionary of function and invoke it.


>>> switch = {'add':add, 'sub':sub, 'mul':multiply}
>>> switch['add'](4,3)
inside add function
7

>>> x = 'mul'
>>> switch[x](3,5)
inside multiply function
15


So, if you miss C's  or VB's 'switch' statement, the above example demonstrates how to elegantly handle it in python.

Monday, January 10, 2011

Using Python to Access OSI Soft's PI Data

If you are not using PI historian (or don't know what it is), move on. This post is not for you.

First of all, for this to work, you will need pisdk installed. I think you already have it if you could access PI data using excel. You will then call it via COM service. Code snippet below:

import time
from datetime import datetime, timedelta
from win32com.client import Dispatch
tdMax = timedelta(minutes=15)
pisdk = Dispatch('PISDK.PISDK')
myServer = pisdk.Servers('PI_ServerName')
con =Dispatch('PISDKDlg.Connections')
con.Login(myServer,'username','password',1,0)
piTimeStart = Dispatch('PITimeServer.PITimeFormat')
piTimeEnd = Dispatch('PITimeServer.PITimeFormat')
piTimeStart.InputString = '08-25-2009 00:00:00'
piTimeEnd.InputString = '08-25-2009 23:59:59'
sampleAsynchStatus = Dispatch('PISDKCommon.PIAsynchStatus')
samplePoint = myServer.PIPoints['tagname']
sampleValues = samplePoint.Data.Summaries2(piTimeStart,piTimeEnd,'1h',5,0,sampleAsynchStatus)
t0 = datetime.now()
while True:
    try:
        valsum = sampleValues("Average").Value # retrieve the value
        break  # it will get out of infinite loop when there is no error
    except:  # it failed because server needs more time
        td = datetime.now() - t0
        if td > tdMax:
            print "It is taking so long to get PI data ! Exiting now ...."
            exit(1)
        time.sleep(3)
i = 1
while i < valsum.Count+1:
    print valsum(i).Value, valsum(i).ValueAttributes('Percentgood').Value
    i += 1
...
 61.4011819388 100.0
61.3148685875 100.0
61.4948477524 100.0
61.2000007629 100.0
61.2000007629 100.0
61.2000007629 100.0
63.8442935978 100.0
64.2453963685 100.0
66.205138361 100.0
71.4747024728 100.0
63.8786754349 100.0
64.3719732268 100.0
64.6296242787 100.0
61.8953031561 100.0
57.0552419993 100.0
57.2999992371 100.0
58.2445955483 100.0
57.5679091366 100.0
59.6997264523 100.0
60.9096459478 100.0
59.7756373596 100.0
55.1742569927 100.0
59.7151307987 100.0

For my officemates. That is just too much work, right? Actually, if you will write the application in VB, your code will look just like that. For this reason, I wrote a  convenience class to hide all those warts and simplify everything as follows:

import sys
sys.path.append(r'M:\EMS\pyhomebrew')

from osipi import osipi
# with time stamp
b = osipi('username','password') # username, then, password, and optionally, PI server
for x, y in b.getAverage('pi-tag here','04-01-2010 00:00','04-01-2010 08:00','1h',True):
    print x, y    # x is time, y is value
# without time stamp

b = osipi('username','password') # username, then, password, and optionally, PI server
for x in b.getAverage('pi-tag here','04-01-2010 00:00','04-01-2010 08:00','1h'):
    print x
# to get the compressed values
for x, y in b.getCompressed('pi-tag here','04-01-2010 00:00','04-01-2010 00:01',True):
    print x, y    # x is time, y is value

Why not use excel? You use excel when it is appropriate but there are instances when it is not. For one, excel has limitations on the number of rows. If you go back too far in time, excel could choke. I was told that the latest version of excel has virtually no limit on the number of rows - not true! I dumped the OAG database via rio and it could not read the whole thing!

Thursday, January 6, 2011

File Handling

Here are common problems and how they are handled in python. If I have a custom-made module that makes it easier, I will label it as "Using my custom module". This custom module is not available except for my office mates.

DELETING ALL FILES AND SUBFOLDERS IN A DIRECTORY named directoryName

import shutil
shutil.rmtree('directoryName')

DELETING A SINGLE FILE

import os
if os.path.exists('nameOfFileToDeleteIncludingCompletePath'):
    os.remove('nameOfFileToDeleteIncludingCompletePath')

You need to check first if file exist before deleting or it may throw an exception. Alternatively, you could do it like below and accomplish the same thing.

import os
try:
    os.remove('nameOfFileToDeleteIncludingCompletePath')
except:
    pass

COPYING FILES

Though this could also be done indirectly through a system call (see last topic of this post), python has a module called shutil that handles this.

import shutil
shutil.copy('sourcefileName','destination')

'destination' could either be another filename or a directory.

GETTING THE FILES WHOSE NAMES FOLLOW A PATTERN

You can use the same pattern matching rule when you use the command line for directory listing. As an example, when you want the files which starts with "abc", followed by anything plus a txt extension, you would do "dir abc*.txt". To get the list of these files in python,

import glob
fileNameList = glob.glob('abc*.txt')

That is, you supply "glob" with same patterns as you would supply the "dir" or "ls" command. If the files are not in the current working directory, you do

import glob
fileNameList = glob.glob('\\path\\to\\file\\abc*.txt')

GETTING THE LATEST FILE IN A DIRECTORY

You most probably would want to do this in conjunction with the file matching capability above. First of all, some background. You can get the time of most recent modification of a file by doing this:

import os
fileAge = os.stat('filename')[8]

os.stat actually gives you the file attributes in a list and the 8th (relative to 0) element gives you the time of most recent modification in milliseconds. The larger this number, the more recent it is.

However, often times, you have a group of files and you want to use the latest. For example your files may look like. alarm_235.txt, alarm_111.txt, alarm_241.txt, and so on. I would do it this way:

import os
import glob

candidateFiles = glob.glob('\\path\\to\\alarm_*.txt')
fileModificationDate = [os.stat(f)[8] for f in candidateFiles]
latestFile = candidateFiles[fileModificationDate.index(max(fileModificationDate)]

"fileModificationDate" is the list containing the "modification date" of the files and in same order as the corresponding files in candidateFiles - i.e. - the first element in fileModificationDate the modification date of the first element in candidateFiles. Therefore, if the largest number (a.k.a. most recent) in fileModificationDate is the 3rd element, then, the corresponding file is the 3rd element in candidateFiles.
Now, max(fileModificationDate) would give you the largest number in fileModificationDate list. fileModificationDate.index(max(fileModificationDate)) will give you the index of this largest number in fileModificationDate. This index, therefore, when used in candidateFiles would give you the filename corresponding to the latest file.

For my officemates: I use this quite a lot and so, I decided to create a module for this. It could actually give you a list with the 1st element being the latest, 2nd element the next latest, and so on. Here is how you use it:

import sys
sys.path.append('M:\\EMS\\pyhomebrew')
from qFileUtils2 import getLatestFiles
latestFile = getLatestFiles('filePattern')[0]

filePattern is the same argument that you supply to glob.

If you want the top 3 latest files:


import sys
sys.path.append('M:\\EMS\\pyhomebrew')
from qFileUtils2 import getLatestFiles
latestFiles = getLatestFiles('filePattern', 3)

The latest file would be latestFiles[0], followed by latestFiles[1], followed by latestFiles[2]. Notice that getLatestFiles may have 1 or 2 arguments. If you supply only 1 argument, it will assume that you need the top 2. In other words, these 2 are identical:

getLatestFiles('filePattern')
getLatestFiles('filePattern', 2)

CHECKING IF A FILE IS UPDATED

Often times, you run a script which uses a data file and you forgot to update the data file. So, it used the old data file and all along, you thought you are done. It would be nice if the script could check the modification/creation time, compare that to the current time, and decide that the data file is not updated and tell you so. For example, you might have a new database release every 2 weeks. Therefore, if I will be running the script right now and the time difference between now and when the data file was last modified is longer than 10 hours, it is possible that the data file was indeed old! I have probably finished with my data file in the morning and running my script in the afternoon. This interval is certainly less than 10 hours.

This is how you do it:

import os
ageInHours = 10 # let us use 10 hrs for now

t1 = os.path.getmtime(fileName)
t2 = time.time()
ageInSeconds = 3600 * ageInHours
if t2-t1 > ageInSeconds:  # file is older than 10 hrs
    print 'data file is not updated!'
else:
    print 'data file is updated!'

For my officemates: I have a module that does this - the function isUpdated.


import sys
sys.path.append('M:\\EMS\\pyhomebrew')
from qFileUtils2 import isUpdated

if not isUpdated('dataFile'):
    print 'Your data file is not updated. Exiting now...'
    exit(1)
# continue with your script


isUpdated may take 2 arguments like this:

isUpdated('filename',48)

where 48 hours is used as basis for saying that the file is not updated; 10 hours is the default if you don't supply the 2nd argument.

RUNNING SYSTEM COMMANDS

If you want to run system command from python:

import os
os.system('notepad test.txt')  # runs notepad and open test.txt

So, whatever you would type in the command line, you simply place inside the parentheses. The only downside is that the result is not echoed back to your script; only the return status (0 if no errors are encountered, nonzero otherwise). For example, "dir *.*" will not return the name of the files on the current working directory. It will just tell you that the command executed and terminated normally via the return value of zero.

How do you pass environment variable to the system command? It is like this:

import os
os.environ['PYTHONPATH'] = r'D:\path\to\other\modules'
os.system('python yourScript.py')  # this invokes another python script; PYTHONPATH is passed

Take note that changing the environment variable in this way is not permanent. It is unset again after your script terminates.

Tuesday, January 4, 2011

Exception Handling

If something happened and python could not go on it will throw an exception. Like for instance, you want to open a non-existing file. You could either do extra processing when the exception happen or ignore it. I won't get into too much detail here and discuss only that which I normally do. Below is how you handle an exception: you place the code inside the except-try block.

try:
    # statements that could potentially throw an exception
except:
    # Now, you have an exception. What do you want to do?

There are instances where you don't want the script to quit when it encounter an exception and just move on. In this case, you just place a 'pass' stetement in the except clause like this:

try:
    # statements that could potentially throw an exception
except:
    pass

If there are specific exceptions that you might wish to catch, place it in the except statement as shown below. Without qualifier, the except clause catches everything.

>>> s = 'abc'
>>> b = float(s)
Traceback (most recent call last):
  File "<interactive input>", line 1, in <module>
ValueError: invalid literal for float(): abc

In the example above, we want to convert a string value to a floating point. Since you don't have numbers in string s, it throws an exception ValueError. You can handle this exception as shown below. As you could see, the exception ValueError was catched and the code inside the except clause was executed. Since the exception was caught in your code, the script went to completion and no 'Traceback' was shown.



>>> try:
...     b = float(s)
... except ValueError:
...     print 's is not a floating point string'
...   
s is not a floating point string


If I have a script that runs via scheduler, I would place statements in the except clause to send me e-mail telling me what went wrong. In that way, I am notified in real-time and need not look on lengthy logs. Below is how I did it.

import smtplib
def emailError(errMsg):
    host='mailServerName'
    s=smtplib.SMTP(host)
    recepients = ['emailAddress1','emailAddress2','emailAddres3']
    msg = "Subject: Application1 Had  some Problems !!!\n\n" + errMsg
    s.sendmail('dummyReplyEmailAddress',recepients,msg)
    exit(1)
try:
    fin = file("D:\\somewhere\\sysgenid","r")
except:
    emailError("Can't open sysgenid file!\nWas it deleted?\n")


Another area where I find exception handling useful is when you have a backup and would want to shift to that when the primary resource is down. For example, say, you have an Oracle server Ora1 which is your primary source of data for your script and then there is backup server Ora2, and a third backup Ora3, this is how I use exception handling to take advantage of this redundancy:

import cx_Oracle
try:
    connection = cx_Oracle.connect('uid/pwd@Ora1')
except:

    try:
   
     connection = cx_Oracle.connect('uid/pwd@Ora2')
    except:
         try:
             connection = cx_Oracle.connect('uid/pwd@Ora3')
          except:
              print 'All Oracle servers are down!!! Exiting now ...'
              exit(1)
cur = connection.cursor()
cur.execute("Select * from userTable")
Now, if there is at least one Oracle server up, the script may continue without any problems. Better yet, use email to notify yourself if any of the Oracle server is down using the previous example.