avyou 发表于 2014-04-10 17:00

求救!!nginx 服务器突然不断出现大量的 bingBot 和 Googlebot 日志

本帖最后由 avyou 于 2014-04-10 17:01 编辑

nginx 服务器突然不断出现大量的 bingBot 和 Googlebot 日志。是很多不断出现哦,带宽严重上升,不是一般的蜘蛛普通抓取。

日志内容如下,其中 11.11.11.11 为我服务器的外网IP,11.11.11.11 - - "GET /ting/ycdxw/180.html HTTP/1.0" 502 166 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - "GET /ting/ycdxw/180.html HTTP/1.0" 499 0 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - "GET /ting/ycdxw/180.html HTTP/1.0" 502 166 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - "GET /ting/ycdxw/180.html HTTP/1.0" 499 0 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - "GET /ting/ycdxw/180.html HTTP/1.0" 502 166 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - "GET /ting/ycdxw/180.html HTTP/1.0" 499 0 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - "GET /ting/ycdxw/180.html HTTP/1.0" 502 166 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - "GET /ting/ycdxw/180.html HTTP/1.0" 499 0 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)我根本找不到外连的IP,看到的只是自己的外网IP,如:# netstat -ntu | awk '{print $5}' | egrep -o "{1,3}\.{1,3}\.{1,3}\.{1,3}" | sort | uniq -c | sort -nr |more
17175 127.0.0.1
   8020 11.11.11.11
    186 192.168.14.2
    .....我在nginx.conf 设置了:if ($http_user_agent ~* "http://www.bing.com/bingbot.htm"){return 403;}
if ($http_user_agent ~* "http://www.google.com/bot.html"){return 403;}访问日志才停止。

$http_user_agent 有可能是伪造的,因为我们网站需要搜索引擎收录,又不能永远过滤它,不知道如何办,各位有没有出现这种情况,是不是真的被攻击了??求助啊,谢谢各位了。

avyou 发表于 2014-04-11 09:51

没有知道吗?为什么日志中记录的IP,是服务器自己的外网IP?

lx281 发表于 2014-04-14 17:24

……你的网站是对外的么?如果是对外的话,bot来抓就抓呗。除非频率过高才会对你网站带来影响,不过google和bing的bot不会有这种低级错误吧。看看你的nginx配置里头,有没有把remote ip显示打开

avyou 发表于 2014-04-15 21:50

本帖最后由 avyou 于 2014-04-15 22:12 编辑

今天又轮到百度和360了,如:11.11.11.11 - - "GET /ting/ycdxw/424.html HTTP/1.0" 502 166 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" -
......
11.11.11.11 - - "GET /ting/ycdxw/555.html HTTP/1.0" 502 568 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0); 360Spider
.....不断重复的出现,而且请求同一个页面,感觉真变态,可能是被CC了,我nginx 日志的获取是 $remote_addr,应该是真实IP才对,但为是什么是自己的IP,求解!
页: [1]
查看完整版本: 求救!!nginx 服务器突然不断出现大量的 bingBot 和 Googlebot 日志