Quick Fix with Least Effort: Manipulating Excel Sheets using Python

There are 2 ways to handle excel sheets using python:

1. Using the xlrd and xlwt
2. Using windows COM object

The first method is very fast and would not need excel application to be installed at the target machine. However, the drawback is that you could only either read an existing spreadsheet or write to NEW spreadsheet. You can't modify an existing spreadsheet. If you really want to modify, you open it for reading, then write your modifications to a temporary file, then overwrite the original file with the temporary file. I don't intend to cover this but here is a tutorial on how to use this module for those who are interested.

The second method is slower and you need to have excel application installed. But then, you have so much more flexibility and modification of existing spreadsheet is possible. I will discuss only this method.

Below is a code snippet on how you do it.

# import the module that you need for handling COM objects

from win32com.client import Dispatch

# Open a workbook for a given excel file

wb = Dispatch('Excel.Application').Workbooks.Open(filename)

# Open a worksheet from that workbook

ws = self.wb.Sheets(sheetName)

# read a cell value

cellValue = ws.Cells(row,col).Text

# write to a cell

ws.Cells(row,col).Value = newValue

The succeeding discussions are for my officemates only. However, you are welcome to read it to get some idea and perhaps roll out your own convenience class.

As you could see, syntax is identical to that in VB. Actually all spreadsheet manipulation that you could do in VB, you could also do in python and the syntax is exactly the same! To me, it looks ugly. So, I decided to create my own convenience class that hides the ugliness and it works like this:

# import the convenience class. It should be in PYTHONPATH
import sys
sys.path.append(r'M:\EMS\pyhomebrew')
from dxls import dxls

# open a workbook
wb = dxls(filename)

# open a worksheet from that workbook
ws = wb.ws(sheetName)

# read a cell value
cellValue = ws.read(row,col)

# write to a cell
ws.write(row,col,newValue)

# close the workbook and save changes
wb.close()

I wrote this class even before I learned enough to know that there is a convention to capitalize the first letter of the class name. Now, I have used this a lot in my other scripts and correcting it would break them.

I also added some convenience feature like instead of:

cellValue = ws.read(row,10)

you could just say:

cellValue = ws. read(row,'j') # cell column is 'J' which is 10th col

Or if you have a table and table heading for this 10th column is 'Address'

cellValue = ws. read(row,'Address')

If you have row heading, you could also use that as well. All of these are automatically done for you with no extra coding on your part. The only requirement is that the table on your worksheet should be well behaved which means:

1. Table has only 2 lines of title.
2. After table column headings, values immediately follow - i.e., no blank line
3. The next 4 lines after the table column headings have no blanks

It also detects the table boundaries: ws.minRow, ws.maxRow, ws.minCol and ws.maxCol

To iterate through the entire table,

for row in xrange(ws.minRow,ws.maxRow):
    for col in xrange(ws.minCol, ws.maxCol):
# do something like multiply the cell contents by 2
        # and then write the result on the same row but 100
        # columns away from current cell position
        ws.write(row,col+100,ws.read(row,col)*2)

As you could see, there is no need to continually check for an empty cell which marks the end of the table - the convenience class has already done it for you.

I mentioned 2 new concepts here: module and PYTHONPATH. That will be the next topic.

Quick Fix with Least Effort

Sunday, January 2, 2011

Manipulating Excel Sheets using Python

No comments:

Post a Comment