yakczh_cu 发表于 2018-01-07 10:38

用列表的值去过滤某个特定的值lambda怎么写

本帖最后由 yakczh_cu 于 2018-01-07 11:13 编辑

filter是用某个特定值去过滤出list里面的元素 如果反过来呢 用某个list去过滤特定值
举个例子
爬到某个表格,有很多列,但是有些是不想要的数据,用关键词来过滤
示例代码 取出不包含['Regular security ','Off-cycle security','Off-cycle stability'] 的内容节点
from pyquery import PyQuery as pq

html='''
<table>
<tr><td>
<i>Official version 0.3 release.</i><sup id="cite_ref-5" class="reference"><a href="#cite_note-5"></a></sup>
</td></tr>
<tr><td><i>Off-cycle security and stability update.</i>
</td></tr>
<tr><td><i>Regular security and stability update.</td>
</tr>
<tr><td><i>Off-cycle stability update.</i></td>
</tr>
</table>
'''

      
doc=pq(html)
for tr indoc("tr").items():
    innerHTML= tr('td').eq(0).html()
    if-1< innerHTML.find('Regular security') or   -1< innerHTML.find('Off-cycle security') or-1< innerHTML.find('Off-cycle stability') :
      continue
   
    print innerHTML

这样可以运行,但是如果配制的keyword多的话格式会很难看
如果改成这样
exclude=['Regular security ','Off-cycle security','Off-cycle stability']
for tr indoc("tr").items():
    innerHTML= tr('td').eq(0).html()
    for keyWord inexclude:
      if -1< innerHTML.find(keyWord):
            continue
    printinnerHTML这样多出一层,continue不跳出外层循环
有没有简洁一点的写法?


jason680 发表于 2018-01-08 11:17

回复 1# yakczh_cu

would you like to use 'set' data type

>>> exclude={'Regular security ','Off-cycle security','Off-cycle stability'}
>>> 'a' in exclude
False
>>> not 'a' in exclude
True
>>> not 'Regular security ' in exclude
False
>>> 'Regular security ' in exclude
True
>>> type(exclude)
<class 'set'>

页: [1]
查看完整版本: 用列表的值去过滤某个特定的值lambda怎么写