忘记密码   免费注册 查看新帖 |

ChinaUnix.net

  平台 论坛 博客 文库 频道自动化运维 虚拟化 储存备份 C/C++ PHP MySQL 嵌入式 Linux系统
最近访问板块 发新帖
查看: 1653 | 回复: 9

请帮我看一下抓取程序出错的原因? [复制链接]

论坛徽章:
0
发表于 2017-12-28 10:34 |显示全部楼层
import requests
from bs4 import BeautifulSoup
def getNewsDetail(newsurl):
    result = {}
    res = requests.get(newsurl)
    res.encoding = 'utf-8'
    soup = BeautifulSoup(res.text,'html.parser')
    result['title'] = soup.select('h1')[0].text
    result['time'] = soup.select('.info-title span')[1].text
    result['article'] = [p.text.strip() for p in soup.select('.info-con span')]
    return result
def parseListLinks(url):
    newsdetails = {}
    res = requests.get(url)
    res.encoding = 'utf-8'
    soup = BeautifulSoup(res.text,'html.parser')
    for news in soup.select('.mb20'):
        newsdetails.append(getNewsDetail(news.select('a')[0]['href']))
    return newsdetails

url = 'http://www.iron-powder.cn/knowledge'
parseListLinks(url)

AttributeError                            Traceback (most recent call last)
<ipython-input-78-b47a839534d7> in <module>()
      1 url = 'http://www.iron-powder.cn/knowledge'
----> 2 parseListLinks(url)
<ipython-input-77-69f7954642e8> in parseListLinks(url)
      8          #   h3 = news.select('h3')[0].text
      9           #  a = news.select('a')[0]['href']
---> 10         newsdetails.append(getNewsDetail(news.select('a')[0]['href']))
     11     return newdetails
AttributeError: 'dict' object has no attribute 'append'

    这是根据教程写的抓取程序,单页内容已经抓取成功,但是抓取列就出错了?刚学python,请帮我改一下错误,谢谢!

论坛徽章:
33
荣誉会员
日期:2011-11-23 16:44:17天秤座
日期:2014-08-26 16:18:20天秤座
日期:2014-08-29 10:12:18丑牛
日期:2014-08-29 16:06:45丑牛
日期:2014-09-03 10:28:58射手座
日期:2014-09-03 16:01:17寅虎
日期:2014-09-11 14:24:21天蝎座
日期:2014-09-17 08:33:55IT运维版块每日发帖之星
日期:2016-04-17 06:23:27操作系统版块每日发帖之星
日期:2016-04-18 06:20:00IT运维版块每日发帖之星
日期:2016-04-24 06:20:0015-16赛季CBA联赛之天津
日期:2016-05-06 12:46:59
发表于 2017-12-30 12:58 |显示全部楼层
本帖最后由 q1208c 于 2017-12-30 13:00 编辑

回复 1# zbhdpx

import requests
from bs4 import BeautifulSoup
def getNewsDetail(newsurl):
    result = {}
    res = requests.get(newsurl)
    res.encoding = 'utf-8'
    soup = BeautifulSoup(res.text,'html.parser')
    result['title'] = soup.select('h1')[0].text
    result['time'] = soup.select('.info-title span')[1].text
    result['article'] = [p.text.strip() for p in soup.select('.info-con span')]
    return result
def parseListLinks(url):
    newsdetails = {}
    res = requests.get(url)
    res.encoding = 'utf-8'
    soup = BeautifulSoup(res.text,'html.parser')
    for news in soup.select('.mb20'):
        newsdetails.append(getNewsDetail(news.select('a')[0]['href']))
    return newsdetails

url = 'http://www.iron-powder.cn/knowledge'
parseListLinks(url)

AttributeError                            Traceback (most recent call last)
<ipython-input-78-b47a839534d7> in <module>()
      1 url = 'http://www.iron-powder.cn/knowledge'
----> 2 parseListLinks(url)
<ipython-input-77-69f7954642e8> in parseListLinks(url)
      8          #   h3 = news.select('h3')[0].text
      9           #  a = news.select('a')[0]['href']
---> 10         newsdetails.append(getNewsDetail(news.select('a')[0]['href']))
     11     return newdetails
AttributeError: 'dict' object has no attribute 'append'

论坛徽章:
0
发表于 2017-12-30 21:26 |显示全部楼层
  1. import requests
  2. from bs4 import BeautifulSoup
  3. def getNewsDetail(newsurl):
  4.     result = {}
  5.     res = requests.get(newsurl)
  6.     res.encoding = 'utf-8'
  7.     soup = BeautifulSoup(res.text,'html.parser')
  8.     result['title'] = soup.select('h1')[0].text
  9.     result['time'] = soup.select('.info-title span')[1].text
  10.     result['article'] = [p.text.strip() for p in soup.select('.info-con span')]
  11.     return result
  12. def parseListLinks(url):
  13.     newsdetails = []
  14.     res = requests.get(url)
  15.     res.encoding = 'utf-8'
  16.     soup = BeautifulSoup(res.text,'html.parser')
  17.     for news in soup.select('li.mb20'):
  18.         content = getNewsDetail(news.select('a')[0]['href'])
  19.         newsdetails.append(content)
  20.     return newsdetails

  21. url = 'http://www.iron-powder.cn/knowledge'
  22. parseListLinks(url)
复制代码


这是单个页面的,你这个不能循环爬所有页面

论坛徽章:
0
发表于 2017-12-30 21:26 |显示全部楼层
  1. import requests
  2. from bs4 import BeautifulSoup
  3. def getNewsDetail(newsurl):
  4.     result = {}
  5.     res = requests.get(newsurl)
  6.     res.encoding = 'utf-8'
  7.     soup = BeautifulSoup(res.text,'html.parser')
  8.     result['title'] = soup.select('h1')[0].text
  9.     result['time'] = soup.select('.info-title span')[1].text
  10.     result['article'] = [p.text.strip() for p in soup.select('.info-con span')]
  11.     return result
  12. def parseListLinks(url):
  13.     newsdetails = []
  14.     res = requests.get(url)
  15.     res.encoding = 'utf-8'
  16.     soup = BeautifulSoup(res.text,'html.parser')
  17.     for news in soup.select('li.mb20'):
  18.         content = getNewsDetail(news.select('a')[0]['href'])
  19.         newsdetails.append(content)
  20.     return newsdetails

  21. url = 'http://www.iron-powder.cn/knowledge'
  22. parseListLinks(url)
复制代码


这是单个页面的,你这个不能循环爬所有页面

论坛徽章:
0
发表于 2017-12-30 21:26 |显示全部楼层
  1. import requests
  2. from bs4 import BeautifulSoup
  3. def getNewsDetail(newsurl):
  4.     result = {}
  5.     res = requests.get(newsurl)
  6.     res.encoding = 'utf-8'
  7.     soup = BeautifulSoup(res.text,'html.parser')
  8.     result['title'] = soup.select('h1')[0].text
  9.     result['time'] = soup.select('.info-title span')[1].text
  10.     result['article'] = [p.text.strip() for p in soup.select('.info-con span')]
  11.     return result
  12. def parseListLinks(url):
  13.     newsdetails = []
  14.     res = requests.get(url)
  15.     res.encoding = 'utf-8'
  16.     soup = BeautifulSoup(res.text,'html.parser')
  17.     for news in soup.select('li.mb20'):
  18.         content = getNewsDetail(news.select('a')[0]['href'])
  19.         newsdetails.append(content)
  20.     return newsdetails

  21. url = 'http://www.iron-powder.cn/knowledge'
  22. parseListLinks(url)
复制代码


这是单个页面的,你这个不能循环爬所有页面

论坛徽章:
0
发表于 2017-12-30 21:27 |显示全部楼层
  1. import requests
  2. from bs4 import BeautifulSoup
  3. def getNewsDetail(newsurl):
  4.     result = {}
  5.     res = requests.get(newsurl)
  6.     res.encoding = 'utf-8'
  7.     soup = BeautifulSoup(res.text,'html.parser')
  8.     result['title'] = soup.select('h1')[0].text
  9.     result['time'] = soup.select('.info-title span')[1].text
  10.     result['article'] = [p.text.strip() for p in soup.select('.info-con span')]
  11.     return result
  12. def parseListLinks(url):
  13.     newsdetails = []
  14.     res = requests.get(url)
  15.     res.encoding = 'utf-8'
  16.     soup = BeautifulSoup(res.text,'html.parser')
  17.     for news in soup.select('li.mb20'):
  18.         content = getNewsDetail(news.select('a')[0]['href'])
  19.         newsdetails.append(content)
  20.     return newsdetails

  21. url = 'http://www.iron-powder.cn/knowledge'
  22. parseListLinks(url)
复制代码
你这只能爬第一页

论坛徽章:
0
发表于 2017-12-30 21:30 |显示全部楼层
  1. import requests
  2. from bs4 import BeautifulSoup
  3. def getNewsDetail(newsurl):
  4.     result = {}
  5.     res = requests.get(newsurl)
  6.     res.encoding = 'utf-8'
  7.     soup = BeautifulSoup(res.text,'html.parser')
  8.     result['title'] = soup.select('h1')[0].text
  9.     result['time'] = soup.select('.info-title span')[1].text
  10.     result['article'] = [p.text.strip() for p in soup.select('.info-con span')]
  11.     return result
  12. def parseListLinks(url):
  13.     newsdetails = []
  14.     res = requests.get(url)
  15.     res.encoding = 'utf-8'
  16.     soup = BeautifulSoup(res.text,'html.parser')
  17.     for news in soup.select('li.mb20'):
  18.         content = getNewsDetail(news.select('a')[0]['href'])
  19.         newsdetails.append(content)
  20.     return newsdetails

  21. url = 'http://www.iron-powder.cn/knowledge'
  22. parseListLinks(url)
复制代码
你这只能爬第一页,

论坛徽章:
0
发表于 2017-12-30 21:35 |显示全部楼层
  1. import requests
  2. from bs4 import BeautifulSoup
  3. def getNewsDetail(newsurl):
  4.     result = {}
  5.     res = requests.get(newsurl)
  6.     res.encoding = 'utf-8'
  7.     soup = BeautifulSoup(res.text,'html.parser')
  8.     result['title'] = soup.select('h1')[0].text
  9.     result['time'] = soup.select('.info-title span')[1].text
  10.     result['article'] = [p.text.strip() for p in soup.select('.info-con span')]
  11.     return result
  12. def parseListLinks(url):
  13.     newsdetails = []
  14.     res = requests.get(url)
  15.     res.encoding = 'utf-8'
  16.     soup = BeautifulSoup(res.text,'html.parser')
  17.     for news in soup.select('li.mb20'):
  18.         content = getNewsDetail(news.select('a')[0]['href'])
  19.         newsdetails.append(content)
  20.     return newsdetails

  21. url = 'http://www.iron-powder.cn/knowledge'
  22. parseListLinks(url)
复制代码

这个只能抓第一页

论坛徽章:
0
发表于 2017-12-30 21:35 |显示全部楼层
  1. import requests
  2. from bs4 import BeautifulSoup
  3. def getNewsDetail(newsurl):
  4.     result = {}
  5.     res = requests.get(newsurl)
  6.     res.encoding = 'utf-8'
  7.     soup = BeautifulSoup(res.text,'html.parser')
  8.     result['title'] = soup.select('h1')[0].text
  9.     result['time'] = soup.select('.info-title span')[1].text
  10.     result['article'] = [p.text.strip() for p in soup.select('.info-con span')]
  11.     return result
  12. def parseListLinks(url):
  13.     newsdetails = []
  14.     res = requests.get(url)
  15.     res.encoding = 'utf-8'
  16.     soup = BeautifulSoup(res.text,'html.parser')
  17.     for news in soup.select('li.mb20'):
  18.         content = getNewsDetail(news.select('a')[0]['href'])
  19.         newsdetails.append(content)
  20.     return newsdetails

  21. url = 'http://www.iron-powder.cn/knowledge'
  22. parseListLinks(url)
复制代码
你这只能爬第一页,

论坛徽章:
0
发表于 2017-12-30 21:36 |显示全部楼层
  1. import requests
  2. from bs4 import BeautifulSoup
  3. def getNewsDetail(newsurl):
  4.     result = {}
  5.     res = requests.get(newsurl)
  6.     res.encoding = 'utf-8'
  7.     soup = BeautifulSoup(res.text,'html.parser')
  8.     result['title'] = soup.select('h1')[0].text
  9.     result['time'] = soup.select('.info-title span')[1].text
  10.     result['article'] = [p.text.strip() for p in soup.select('.info-con span')]
  11.     return result
  12. def parseListLinks(url):
  13.     newsdetails = []
  14.     res = requests.get(url)
  15.     res.encoding = 'utf-8'
  16.     soup = BeautifulSoup(res.text,'html.parser')
  17.     for news in soup.select('li.mb20'):
  18.         content = getNewsDetail(news.select('a')[0]['href'])
  19.         newsdetails.append(content)
  20.     return newsdetails

  21. url = 'http://www.iron-powder.cn/knowledge'
  22. parseListLinks(url)
复制代码
你这只能爬第一页,
您需要登录后才可以回帖 登录 | 注册

本版积分规则

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号 北京市公安局海淀分局网监中心备案编号:11010802020122
广播电视节目制作经营许可证(京) 字第1234号 中国互联网协会会员  联系我们:wangnan@it168.com
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP