Here are common problems and how they are handled in python. If I have a custom-made module that makes it easier, I will label it as "Using my custom module". This custom module is not available except for my office mates.
DELETING ALL FILES AND SUBFOLDERS IN A DIRECTORY named directoryName
import shutil
shutil.rmtree('directoryName')
DELETING A SINGLE FILE
import os
if os.path.exists('nameOfFileToDeleteIncludingCompletePath'):
os.remove('nameOfFileToDeleteIncludingCompletePath')
You need to check first if file exist before deleting or it may throw an exception. Alternatively, you could do it like below and accomplish the same thing.
import os
try:
os.remove('nameOfFileToDeleteIncludingCompletePath')
except:
pass
COPYING FILES
Though this could also be done indirectly through a system call (see last topic of this post), python has a module called shutil that handles this.
import shutil
shutil.copy('sourcefileName','destination')
'destination' could either be another filename or a directory.
GETTING THE FILES WHOSE NAMES FOLLOW A PATTERN
You can use the same pattern matching rule when you use the command line for directory listing. As an example, when you want the files which starts with "abc", followed by anything plus a txt extension, you would do "dir abc*.txt". To get the list of these files in python,
import glob
fileNameList = glob.glob('abc*.txt')
That is, you supply "glob" with same patterns as you would supply the "dir" or "ls" command. If the files are not in the current working directory, you do
import glob
fileNameList = glob.glob('\\path\\to\\file\\abc*.txt')
GETTING THE LATEST FILE IN A DIRECTORY
You most probably would want to do this in conjunction with the file matching capability above. First of all, some background. You can get the time of most recent modification of a file by doing this:
import os
fileAge = os.stat('filename')[8]
os.stat actually gives you the file attributes in a list and the 8th (relative to 0) element gives you the time of most recent modification in milliseconds. The larger this number, the more recent it is.
However, often times, you have a group of files and you want to use the latest. For example your files may look like. alarm_235.txt, alarm_111.txt, alarm_241.txt, and so on. I would do it this way:
import os
import glob
candidateFiles = glob.glob('\\path\\to\\alarm_*.txt')
fileModificationDate = [os.stat(f)[8] for f in candidateFiles]
latestFile = candidateFiles[fileModificationDate.index(max(fileModificationDate)]
"fileModificationDate" is the list containing the "modification date" of the files and in same order as the corresponding files in candidateFiles - i.e. - the first element in fileModificationDate the modification date of the first element in candidateFiles. Therefore, if the largest number (a.k.a. most recent) in fileModificationDate is the 3rd element, then, the corresponding file is the 3rd element in candidateFiles.
Now, max(fileModificationDate) would give you the largest number in fileModificationDate list. fileModificationDate.index(max(fileModificationDate)) will give you the index of this largest number in fileModificationDate. This index, therefore, when used in candidateFiles would give you the filename corresponding to the latest file.
For my officemates: I use this quite a lot and so, I decided to create a module for this. It could actually give you a list with the 1st element being the latest, 2nd element the next latest, and so on. Here is how you use it:
import sys
sys.path.append('M:\\EMS\\pyhomebrew')
from qFileUtils2 import getLatestFiles
latestFile = getLatestFiles('filePattern')[0]
filePattern is the same argument that you supply to glob.
If you want the top 3 latest files:
import sys
sys.path.append('M:\\EMS\\pyhomebrew')
from qFileUtils2 import getLatestFiles
latestFiles = getLatestFiles('filePattern', 3)
The latest file would be latestFiles[0], followed by latestFiles[1], followed by latestFiles[2]. Notice that getLatestFiles may have 1 or 2 arguments. If you supply only 1 argument, it will assume that you need the top 2. In other words, these 2 are identical:
getLatestFiles('filePattern')
getLatestFiles('filePattern', 2)
CHECKING IF A FILE IS UPDATED
Often times, you run a script which uses a data file and you forgot to update the data file. So, it used the old data file and all along, you thought you are done. It would be nice if the script could check the modification/creation time, compare that to the current time, and decide that the data file is not updated and tell you so. For example, you might have a new database release every 2 weeks. Therefore, if I will be running the script right now and the time difference between now and when the data file was last modified is longer than 10 hours, it is possible that the data file was indeed old! I have probably finished with my data file in the morning and running my script in the afternoon. This interval is certainly less than 10 hours.
This is how you do it:
import os
ageInHours = 10 # let us use 10 hrs for now
t1 = os.path.getmtime(fileName)
t2 = time.time()
ageInSeconds = 3600 * ageInHours
if t2-t1 > ageInSeconds: # file is older than 10 hrs
print 'data file is not updated!'
else:
print 'data file is updated!'
For my officemates: I have a module that does this - the function isUpdated.
import sys
sys.path.append('M:\\EMS\\pyhomebrew')
from qFileUtils2 import isUpdated
if not isUpdated('dataFile'):
print 'Your data file is not updated. Exiting now...'
exit(1)
# continue with your script
isUpdated may take 2 arguments like this:
isUpdated('filename',48)
where 48 hours is used as basis for saying that the file is not updated; 10 hours is the default if you don't supply the 2nd argument.
RUNNING SYSTEM COMMANDS
If you want to run system command from python:
import os
os.system('notepad test.txt') # runs notepad and open test.txt
So, whatever you would type in the command line, you simply place inside the parentheses. The only downside is that the result is not echoed back to your script; only the return status (0 if no errors are encountered, nonzero otherwise). For example, "dir *.*" will not return the name of the files on the current working directory. It will just tell you that the command executed and terminated normally via the return value of zero.
How do you pass environment variable to the system command? It is like this:
import os
os.environ['PYTHONPATH'] = r'D:\path\to\other\modules'
os.system('python yourScript.py') # this invokes another python script; PYTHONPATH is passed
Take note that changing the environment variable in this way is not permanent. It is unset again after your script terminates.
No comments:
Post a Comment