为啥re.findall的结果出现多余的, " 等？

blackantt 发表于 2021-04-20 19:14

import requests
import re
url = 'http://www.shubang.net/book/66_2151.html'
headers = {'User-Agent':'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/89.0.4389.128 Safari/537.36'}
web_data = requests.get(url, headers=headers)
web_data.encoding = 'utf-8'
txt = web_data.text
items = re.findall(r'line_en\" \>(.*)<|line_cn\" title=\"(.*)\"', txt)
for item in items:
print(item)

结果如下所示
。。。。。
('"It doesn't look new. It looks old," one of the boys said.', '')('', '“房子一点也不新，旧死了，”其中一个男孩说。')('It just couldn't be.', '')('', '绝对不可能。')('The other members of his family turned to stare at me.', '')('', '其他人都把目光转向了我。')
............

请问：
1.上面的 ') , ( 是哪来的？
2.couldn't 变成了 couldn'是咋回事？

blackantt 发表于 2021-04-21 11:24

知道了，要用 replace 函数做替换

页: [1]

Chinaunix's Archiver

为啥re.findall的结果出现 多余的, " 等？

为啥re.findall的结果出现多余的, " 等？