Chinaunix
标题:
请教python爬虫问题
[打印本页]
作者:
nanshanjin
时间:
2020-02-17 11:07
标题:
请教python爬虫问题
完全对python爬虫不懂,现有html网页文件,我用了BeautifulSoup,请问如何提取/kegg-bin/show_pathway?158190805124419/hsa01100.args,hsa01100,Met**lic pathways - Homo sapiens (human),53这四个呢?谢谢
<li><a href="/kegg-bin/show_pathway?158190805124419/hsa01100.args" target="_blank">hsa01100</a> Met**lic pathways - Homo sapiens (human) (<a href="javascript:display('hsa01100')">53</a>)
作者:
nulcearbear
时间:
2020-02-19 16:27
from bs4 import BeautifulSoup
a='<li><a href="/kegg-bin/show_pathway?158190805124419/hsa01100.args" target="_blank">hsa01100</a> Met**lic pathways - Homo sapiens (human) (<a href="javascript:display(\'hsa01100\')">53</a>)'
soup=BeautifulSoup(a,'lxml')
soup.li.contents[0]['href']
soup.li.contents[0].text
soup.li.contents[1]
soup.li.contents[2].text
复制代码
作者:
nanshanjin
时间:
2020-02-20 10:55
回复
2#
nulcearbear
非常感谢
欢迎光临 Chinaunix (http://bbs.chinaunix.net/)
Powered by Discuz! X3.2