- 论坛徽章:
- 1
|
本帖最后由 yakczh_cu 于 2018-01-07 11:13 编辑
filter是用某个特定值去过滤出list里面的元素 如果反过来呢 用某个list去过滤特定值
举个例子
爬到某个表格,有很多列,但是有些是不想要的数据,用关键词来过滤
示例代码 取出不包含['Regular security ','Off-cycle security','Off-cycle stability'] 的内容节点
- from pyquery import PyQuery as pq
- html='''
- <table>
- <tr><td>
- <i>Official version 0.3 release.</i><sup id="cite_ref-5" class="reference"><a href="#cite_note-5">[4]</a></sup>
- </td></tr>
- <tr><td><i>Off-cycle security and stability update.</i>
- </td></tr>
- <tr><td><i>Regular security and stability update.</td>
- </tr>
- <tr><td><i>Off-cycle stability update.</i></td>
- </tr>
- </table>
- '''
-
- doc=pq(html)
- for tr in doc("tr").items():
- innerHTML= tr('td').eq(0).html()
- if -1< innerHTML.find('Regular security') or -1< innerHTML.find('Off-cycle security') or -1< innerHTML.find('Off-cycle stability') :
- continue
-
- print innerHTML
复制代码
这样可以运行,但是如果配制的keyword多的话格式会很难看
如果改成这样
- exclude=['Regular security ','Off-cycle security','Off-cycle stability']
- for tr in doc("tr").items():
- innerHTML= tr('td').eq(0).html()
- for keyWord in exclude:
- if -1< innerHTML.find(keyWord):
- continue
- print innerHTML
复制代码 这样多出一层,continue不跳出外层循环
有没有简洁一点的写法?
|
|