Chinaunix

标题: 求救!!nginx 服务器突然不断出现大量的 bingBot 和 Googlebot 日志 [打印本页]

作者: avyou    时间: 2014-04-10 17:00
标题: 求救!!nginx 服务器突然不断出现大量的 bingBot 和 Googlebot 日志
本帖最后由 avyou 于 2014-04-10 17:01 编辑

nginx 服务器突然不断出现大量的 bingBot 和 Googlebot 日志。是很多不断出现哦,带宽严重上升,不是一般的蜘蛛普通抓取。

日志内容如下,其中 11.11.11.11 为我服务器的外网IP,
  1. 11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 502 166 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
  2. 11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 499 0 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
  3. 11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 502 166 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
  4. 11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 499 0 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
  5. 11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 502 166 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
  6. 11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 499 0 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
  7. 11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
  8. 11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
  9. 11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
  10. 11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
  11. 11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 502 166 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
  12. 11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 499 0 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
  13. 11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
  14. 11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
  15. 11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
  16. 11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
  17. 11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
  18. 11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
  19. 11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
  20. 11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
复制代码
我根本找不到外连的IP,看到的只是自己的外网IP,如:
  1. # netstat -ntu | awk '{print $5}' | egrep -o "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" | sort | uniq -c | sort -nr |more
  2.   17175 127.0.0.1
  3.    8020 11.11.11.11
  4.     186 192.168.14.2
  5.     .....
复制代码
我在nginx.conf 设置了:
  1. if ($http_user_agent ~* "http://www.bing.com/bingbot.htm"){return 403;}
  2. if ($http_user_agent ~* "http://www.google.com/bot.html"){return 403;}
复制代码
访问日志才停止。

$http_user_agent 有可能是伪造的,因为我们网站需要搜索引擎收录,又不能永远过滤它,不知道如何办,各位有没有出现这种情况,是不是真的被攻击了??求助啊,谢谢各位了。
作者: avyou    时间: 2014-04-11 09:51
没有知道吗?  为什么日志中记录的IP,是服务器自己的外网IP?
作者: lx281    时间: 2014-04-14 17:24
……你的网站是对外的么?如果是对外的话,bot来抓就抓呗。除非频率过高才会对你网站带来影响,不过google和bing的bot不会有这种低级错误吧。看看你的nginx配置里头,有没有把remote ip显示打开
作者: avyou    时间: 2014-04-15 21:50
本帖最后由 avyou 于 2014-04-15 22:12 编辑

今天又轮到百度和360了,如:
  1. 11.11.11.11 - - [15/Apr/2014:12:19:47 +0800] "GET /ting/ycdxw/424.html HTTP/1.0" 502 166 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" -
  2. ......
  3. 11.11.11.11 - - [15/Apr/2014:14:31:32 +0800] "GET /ting/ycdxw/555.html HTTP/1.0" 502 568 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0); 360Spider
  4. .....
复制代码
不断重复的出现,而且请求同一个页面,感觉真变态,可能是被CC了,我nginx 日志的获取是 $remote_addr,应该是真实IP才对,但为是什么是自己的IP,求解!
作者: gg22mm    时间: 2017-10-17 10:57
我也是同样的问题,经常出现在不些不明来爬我的网站,有点损耗性能不说。。 同问解决办法




欢迎光临 Chinaunix (http://bbs.chinaunix.net/) Powered by Discuz! X3.2