为啥re.findall的结果出现 多余的, " 等?
import requestsimport re
url = 'http://www.shubang.net/book/66_2151.html'
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.128 Safari/537.36'}
web_data = requests.get(url, headers=headers)
web_data.encoding = 'utf-8'
txt = web_data.text
items = re.findall(r'line_en\" \>(.*)<|line_cn\" title=\"(.*)\"', txt)
for item in items:
print(item)
结果如下所示
。。。。。
('"It doesn't look new. It looks old," one of the boys said.', '')('', '“房子一点也不新,旧死了,”其中一个男孩说。')('It just couldn't be.', '')('', '绝对不可能。')('The other members of his family turned to stare at me.', '')('', '其他人都把目光转向了我。')
............
请问:
1.上面的 ') , ( 是哪来的?
2.couldn't 变成了 couldn'是咋回事?
知道了, 要用 replace 函数 做替换
页:
[1]