论坛徽章:: 0

电梯直达

1楼 [收藏(0)] [报告]

发表于 2009-09-10 07:56 |只看该作者 |倒序浏览

Python开发
出自个人备忘录（wiki中迁移来）
跳转到:
导航
,
搜索
目录[
隐藏
]

1 目录处理

1.1 得到当前目录

1.2 遍历当前目录

1.3 python目录和文件处理(助记)

3.1 list和string之间的互相转换

3.2 正则表达式组的查询

3.3 ConfigParser模块tips

3.4 正则表达式选择性匹配的问题

3.5 mysql的sql中字符串转义方法

4 血的教训

4.1 异常处理逻辑中安全么？

4.2 大规模数据处理的能力

[
编辑
] 目录处理
[
编辑
] 得到当前目录
os.getcwd() 得到当前工作目录
Return a string representing the current working directory. Availability: Unix, Windows.
[
编辑
] 遍历当前目录
os.walk(top[, topdown=True[, onerror=None[, followlinks=False]]])
   Generate the file names in a directory tree by walking the tree either top-down or bottom-up. For
each directory in the tree rooted at directory top (including top itself), it yields a 3-tuple
(dirpath, dirnames, filenames).
   dirpath is a string, the path to the directory. dirnames is a list of the names of the
subdirectories in dirpath (excluding '.' and '..'). filenames is a list of the names of the
non-directory files in dirpath. Note that the names in the lists contain no path components. To get a
full path (which begins with top) to a file or directory in dirpath, do os.path.join(dirpath, name).
   If optional argument topdown is True or not specified, the triple for a directory is generated
before the triples for any of its subdirectories (directories are generated top-down). If topdown is
False, the triple for a directory is generated after the triples for all of its subdirectories
(directories are generated bottom-up).
   When topdown is True, the caller can modify the dirnames list in-place (perhaps using del or slice
assignment), and walk() will only recurse into the subdirectories whose names remain in dirnames;
this can be used to prune the search, impose a specific order of visiting, or even to inform walk()
about directories the caller creates or renames before it resumes walk() again. Modifying dirnames
when topdown is False is ineffective, because in bottom-up mode the directories in dirnames are
generated before dirpath itself is generated.
   By default errors from the listdir() call are ignored. If optional argument onerror is specified,
it should be a function; it will be called with one argument, an OSError instance. It can report the
error to continue with the walk, or raise the exception to abort the walk. Note that the filename is
available as the filename attribute of the exception object.
   By default, walk() will not walk down into symbolic links that resolve to directories. Set
followlinks to True to visit directories pointed to by symlinks, on systems that support them.
[
编辑
] python目录和文件处理(助记)os和os.path模块
os.listdir(dirname)：列出dirname下的目录和文件
os.getcwd()：获得当前工作目录
os.curdir:返回但前目录（'.')
os.chdir(dirname):改变工作目录到dirname
os.path.isdir(name):判断name是不是一个目录，name不是目录就返回false
os.path.isfile(name):判断name是不是一个文件，不存在name也返回false
os.path.exists(name):判断是否存在文件或目录name
os.path.getsize(name):获得文件大小，如果name是目录返回0L
os.path.abspath(name):获得绝对路径
os.path.normpath(path):规范path字符串形式
os.path.split(name):分割文件名与目录（事实上，如果你完全使用目录，它也会将最后一个目录作为文件名而分离，同时它不会判断文件或目录是否存在）
os.path.splitext():分离文件名与扩展名
os.path.join(path,name):连接目录与文件名或目录
os.path.basename(path):返回文件名
os.path.dirname(path):返回文件路径
>>> import os
>>> os.getcwd()
'C:\\Python25'
>>> os.chdir(r'C:\temp')
>>> os.getcwd()
'C:\\temp'
>>> os.listdir('.')
['temp.txt', 'test.py', 'testdir', 'tt']
>>> os.listdir(os.curdir)
['temp.txt', 'test.py', 'testdir', 'tt']
>>> os.path.getsize('test.py')
38L
>>> os.path.isdir('tt')
True
>>> os.path.getsize('tt')
0L
>>> os.path.abspath('tt')
'c:\\temp\\tt'
>>> os.path.abspath('test.py')
'c:\\temp\\test.py'
>>> os.path.abspath('.')
'c:\\temp'
>>>
>>> os.path.split(r'.\tt')
('.', 'tt')
>>> os.path.split(r'c:\temp\test.py')
('c:\\temp', 'test.py')
>>> os.path.split(r'c:\temp\test.dpy')
('c:\\temp', 'test.dpy'
>>> os.path.splitext(r'c:\temp\test.py')
('c:\\temp\\test', '.py')
>>> os.path.splitext(r'c:\temp\tst.py')
('c:\\temp\\tst', '.py')
>>>
>>> os.path.basename(r'c:\temp\tst.py')
'tst.py'
>>> os.path.dirname(r'c:\temp\tst.py')
'c:\\temp'
>>>
[
编辑
] 文件处理
[
编辑
] 文件内容逐行处理fp = open(os.getcwd()+'/test.cf')
for line in fp:
print line
[
编辑
] python对象的文件保存
优秀的cPickle模块可以提供解决大部分问题
>>>import cPickle
>>>fp = open(fileName,'w')
>>>cPickle.dump(pythonObject,fp,1)
>>>fp.close()
>>>f = open(fileNmae, 'r')
>>>object = cPickle.load(f)
多次执行上面最后这一句，可以将保存到文件中的各个对象一个一个取出来
[
编辑
] 字符串处理
[
编辑
] list和string之间的互相转换
从string 到 list非常直接
>>>a = "klfddsf'"
>>>b = list(a)
>>>print b
['k', 'l', 'f', 'd', 'd', 's', 'f']
而从 string 到 list 就不那么容易了,要借用 string 模块的功能
>>>import string
>>>c = ['a', 'd', 'g', 'f', 'i', 'k', 'j', 'l', 'o', 's', 'r', 'u']
>>>d = string.join(c,'')
>>>print d
'adgfikjlosru'
[
编辑
] 正则表达式组的查询

python的正则表达式
python的正则表达式手册

精心设计的 REs 也许会用很多组，既可以捕获感兴趣的子串，又可以分组和结构化 RE 本身。在复杂的 REs 里，追踪组号变得困难。有两个功能可以对这个问题有所帮助。
Perl的解决方案：
Perl 开发人员的解决方法是使用 (?...) 来做为扩展语法。"?" 在括号后面会直接导致一个语法错误，因为 "?" 没有任何字符可以重复，因此它不会产生任何兼容问题。紧随 "?" 之后的字符指出扩展的用途，因此 (?=foo)
Python的方案：
Python 新增了一个扩展语法到 Perl 扩展语法中。如果在问号后的第一个字符是 "P"，你就可以知道它是针对 Python 的扩展。目前有两个这样的扩展: (?P...) 定义一个命名组，(?P=name) 则是对命名组的逆向引用。如果 Perl 5 的未来版本使用不同的语法增加了相同的功能，那么 re 模块也将改变以支持新的语法，这是为了兼容性的目的而保持的 Python 专用语法。
Java的正则表达式不支持这种方式的组查询。
例如:
>>>import re
>>>priceRe = re.compile("Sale\sprice.*(?P\$\d+\.\d+)")
>>>test = 'Sale price: $39.99'
>>>m = priceRe.search(test)
>>>m.group('price')
'$39.99'
[
编辑
] ConfigParser模块tips
该模块的手册是：
RawConfigParser.get(section, option)
Get an option value for the named section.
RawConfigParser.getint(section, option)
A convenience method which coerces the option in the specified section to an integer.
因此，如果对参数是字符串的属性来说，字符串不能再加上""了！例如：
[pattern]
image = "imagefile"
ConfigParser模块读入以后，我们得到的字符串是 "\"imagefile\"" 而不是我想要的 "imagefile"
[
编辑
] 正则表达式选择性匹配的问题
举例说明吧：我想做的是对两种类型的url地址进行匹配比如
http://www.textbooks.com/Cat.php?CSID=Q2C0CDU0QTC0MOUCMDOKOT2DQ&SBC=T7
和
http://www.textbooks.com/ebook-sitemap.php?CSID=Q2C0CDU0QTC0MOUCMDOKOT2DQ&FVVALUE=AE&EBC=Architecture,+Engineering+%26+Transportation%2F%2FEngineering
我一开始写的是
>>>import re
>>>aRe = re.compile("http://www.textbooks.com/.*?ebook\-sitemap\.php|Cat\.php")
>>>a = "http://www.textbooks.com/Cat.php?CSID=Q2C0CDU0QTC0MOUCMDOKOT2DQ&SBC=T7"
>>>m = aRe.search(a)
>>>m.group()
'Cat.php'
这说明| 的用法，如果要进行"或"关系的匹配的话，需要添加括号
>>>bRe = re.compile("http://www.textbooks.com/.*?(ebook\-sitemap\.php|Cat\.php)")
>>>m2 = bRe.search(a)
>>>m2.group()
'http://www.textbooks.com/Cat.php'
[
编辑
] mysql的sql中字符串转义方法
为了这个mysql转义字符串问题，以前折腾了很久，都是自己写的一个简单函数。
昨天才发现MySQLdb里面自带了escape_string函数，非常好用。
>>>import MySQLdb
>>>s = """flskdjaflksdj''':KFjdls"""
>>>print MySQLdb.escape_string(s)
flskdjaflksdj\'\'\':KFjdls
[
编辑
] 血的教训
[
编辑
] 异常处理逻辑中安全么？
异常处理是经常使用的一套控制错误范围的方法，现在的问题是，如果异常处理代码中也有问题，会抛出异常，往往开发者会不太注意，因为眼睛经常是放在正常处理流程上的，这里要特别注意！举例：
for attr in attrList:
try:
      productAttrDict[attr] = m.group(attr)
      print attr + ':' + m.group(attr)
except:
      productAttrDict[attr] = None
      print attr + ':' + m.group(attr)
      log.error(attr + ': index error !')
      error.traceback()
上面的这段代码有问题么？本来，except要捕获的就是 m.group(attr)抛出异常时候的情况，而这里居然为了保留现场，仍然将该代码放在except中，这样会导致再次出现异常，由于此时不会再有捕获异常的代码，因此会让程序跑飞。
教训：
1.程序中的一场处理代码尽可能简单，不要夹杂复杂的模块和逻辑，防止再次抛出异常；
2.尽量准确定义异常处理代码针对的是哪个异常例如 except AssertionError；尽量增加一个finally 去捕获所有的情况；

本文来自ChinaUnix博客，如果查看原文请点：http://blog.chinaunix.net/u2/88420/showart_2049433.html

文库|博客

返回列表

Chinaunix › 论坛 › 程序设计 › Python › Python文档中心 › python开发tips[原创]

python开发tips[原创] [复制链接]

浏览过的版块