Chinaunix

标题: 请教python爬虫问题 [打印本页]

作者: nanshanjin    时间: 2020-02-17 11:07
标题: 请教python爬虫问题
完全对python爬虫不懂,现有html网页文件,我用了BeautifulSoup,请问如何提取/kegg-bin/show_pathway?158190805124419/hsa01100.args,hsa01100,Met**lic pathways - Homo sapiens (human),53这四个呢?谢谢
<li><a href="/kegg-bin/show_pathway?158190805124419/hsa01100.args" target="_blank">hsa01100</a> Met**lic pathways - Homo sapiens (human) (<a href="javascript:display('hsa01100')">53</a>)




作者: nulcearbear    时间: 2020-02-19 16:27
  1. from bs4 import BeautifulSoup
  2. a='<li><a href="/kegg-bin/show_pathway?158190805124419/hsa01100.args" target="_blank">hsa01100</a> Met**lic pathways - Homo sapiens (human) (<a href="javascript:display(\'hsa01100\')">53</a>)'
  3. soup=BeautifulSoup(a,'lxml')
  4. soup.li.contents[0]['href']
  5. soup.li.contents[0].text
  6. soup.li.contents[1]
  7. soup.li.contents[2].text
复制代码

作者: nanshanjin    时间: 2020-02-20 10:55
回复 2# nulcearbear

非常感谢





欢迎光临 Chinaunix (http://bbs.chinaunix.net/) Powered by Discuz! X3.2