Chinaunix
标题:
求救!!nginx 服务器突然不断出现大量的 bingBot 和 Googlebot 日志
[打印本页]
作者:
avyou
时间:
2014-04-10 17:00
标题:
求救!!nginx 服务器突然不断出现大量的 bingBot 和 Googlebot 日志
本帖最后由 avyou 于 2014-04-10 17:01 编辑
nginx 服务器突然不断出现大量的 bingBot 和 Googlebot 日志。是很多不断出现哦,带宽严重上升,不是一般的蜘蛛普通抓取。
日志内容如下,其中 11.11.11.11 为我服务器的外网IP,
11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 502 166 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 499 0 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 502 166 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 499 0 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 502 166 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 499 0 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 502 166 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 499 0 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" -
11.11.11.11 - - [10/Apr/2014:16:43:35 +0800] "GET /ting/ycdxw/180.html HTTP/1.0" 504 176 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
复制代码
我根本找不到外连的IP,看到的只是自己的外网IP,如:
# netstat -ntu | awk '{print $5}' | egrep -o "[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}\.[0-9]{1,3}" | sort | uniq -c | sort -nr |more
17175 127.0.0.1
8020 11.11.11.11
186 192.168.14.2
.....
复制代码
我在nginx.conf 设置了:
if ($http_user_agent ~* "http://www.bing.com/bingbot.htm"){return 403;}
if ($http_user_agent ~* "http://www.google.com/bot.html"){return 403;}
复制代码
访问日志才停止。
$http_user_agent 有可能是伪造的,因为我们网站需要搜索引擎收录,又不能永远过滤它,不知道如何办,各位有没有出现这种情况,是不是真的被攻击了??求助啊,谢谢各位了。
作者:
avyou
时间:
2014-04-11 09:51
没有知道吗? 为什么日志中记录的IP,是服务器自己的外网IP?
作者:
lx281
时间:
2014-04-14 17:24
……你的网站是对外的么?如果是对外的话,bot来抓就抓呗。除非频率过高才会对你网站带来影响,不过google和bing的bot不会有这种低级错误吧。看看你的nginx配置里头,有没有把remote ip显示打开
作者:
avyou
时间:
2014-04-15 21:50
本帖最后由 avyou 于 2014-04-15 22:12 编辑
今天又轮到百度和360了,如:
11.11.11.11 - - [15/Apr/2014:12:19:47 +0800] "GET /ting/ycdxw/424.html HTTP/1.0" 502 166 "-" "Mozilla/5.0 (compatible; Baiduspider/2.0; +http://www.baidu.com/search/spider.html)" -
......
11.11.11.11 - - [15/Apr/2014:14:31:32 +0800] "GET /ting/ycdxw/555.html HTTP/1.0" 502 568 "-" "Mozilla/5.0 (compatible; MSIE 9.0; Windows NT 6.1; Trident/5.0); 360Spider
.....
复制代码
不断重复的出现,而且请求同一个页面,感觉真变态,可能是被CC了,我nginx 日志的获取是 $remote_addr,应该是真实IP才对,但为是什么是自己的IP,求解!
作者:
gg22mm
时间:
2017-10-17 10:57
我也是同样的问题,经常出现在不些不明来爬我的网站,有点损耗性能不说。。 同问解决办法
欢迎光临 Chinaunix (http://bbs.chinaunix.net/)
Powered by Discuz! X3.2