- 论坛徽章:
- 1
|
本帖最后由 kaixin9ok 于 2016-07-16 02:44 编辑
环境: CentOS release 6.2 2.6.32-220.el6.x86_64
keepalived-1.2.7 ipvsadm v1.26 IPVS v1.2.1
使用的是keepalived做健康检查
因为目前使用VIP的数量有200左右. 每个VIP下面realserver约在5-10个左右
每个keepalived所管理的realserver数量大于1100个左右,keepalived的healthcheckers进程就会挂掉,然后进入无限循环.
循环其间keepalived healthcheckers不能正常工作,当有RS无法连接时,keepalived不会从VIP中踢出.
时间长达1-2小时左右后,keepalived三个进程全部恢复正常.并且在恢复正常后。再次reload 或restart,还是一样会出现死循环.
想请教一下各位,keepalived是否有对Realserver列表数量有数值定义. 还是说keealived只能管理1100左右的Realserver.
在keepalived源码中似乎也没有找到对该数值的定义. 并且在出现healthcheckers进程出现崩溃时, keepalived主进程与vrrp进程工作正常(在日志中只看到healthcheckers不停的重新starting)
/var/log/message 出现的日志如下:
Jul 14 19:47:07 b02 Keepalived_healthcheckers[14203]: Cannot send get request to [10.15.200.200]:80.
Jul 14 19:47:07 b02 Keepalived_healthcheckers[14203]: Removing service [10.100.200.200]:80 from VS [10.15.177.177]:80
Jul 14 19:47:07 b02 Keepalived_healthcheckers[14203]: SMTP connection ERROR to [127.0.0.1]:25.
Jul 14 19:47:07 b02 Keepalived[13055]: Healthcheck child process(14203) died: Respawning
Jul 14 19:47:07 b02 Keepalived[13055]: Starting Healthcheck child process, pid=14525
Jul 14 19:47:07 b02 Keepalived_healthcheckers[14525]: Interface queue is empty
Jul 14 19:47:07 b02 Keepalived_healthcheckers[14525]: No such interface, eth1
Jul 14 19:47:07 b02 Keepalived_healthcheckers[14525]: No such interface, usb0
Jul 14 19:47:07 b02 Keepalived_healthcheckers[14525]: No such interface, bond0
查看进程时keepalived healthcheckers进程ID一直在变化.表现一直在重新starting新进程
[root@lvs]$ ps axu | grep keepalived
root 10124 0.0 0.0 109296 1144 ? Ss Jul15 0:04 /usr/sbin/keepalived -D
root 10126 0.0 0.0 111556 2364 ? S Jul15 0:04 /usr/sbin/keepalived -D
root 10405 17.0 0.0 112896 4204 ? S 02:39 0:00 /usr/sbin/keepalived -D
root 10407 0.0 0.0 6428 496 pts/0 S+ 02:39 0:00 grep keepalived
tty:[0] jobs:[0] cwd:[~]
[root@lvs]$ ps axu | grep keepalived
root 10124 0.0 0.0 109296 1144 ? Ss Jul15 0:04 /usr/sbin/keepalived -D
root 10126 0.0 0.0 111556 2364 ? S Jul15 0:04 /usr/sbin/keepalived -D
root 10701 8.0 0.0 112896 4204 ? S 02:39 0:00 /usr/sbin/keepalived -D
root 10703 0.0 0.0 6428 496 pts/0 S+ 02:39 0:00 grep keepalived
tty:[0] jobs:[0] cwd:[~]
[root@lvs]$ ps axu | grep keepalived
root 10124 0.0 0.0 109296 1144 ? Ss Jul15 0:04 /usr/sbin/keepalived -D
root 10126 0.0 0.0 111556 2364 ? S Jul15 0:04 /usr/sbin/keepalived -D
root 13041 8.0 0.0 112896 4204 ? S 02:40 0:00 /usr/sbin/keepalived -D
root 13043 0.0 0.0 6428 496 pts/0 S+ 02:40 0:00 grep keepalived
tty:[0] jobs:[0] cwd:[~]
###源码中对check检查并重新start片段~
check_respawn_thread(thread_t * thread)
{
pid_t pid;
/* Fetch thread args */
pid = THREAD_CHILD_PID(thread);
/* Restart respawning thread */
if (thread->type == THREAD_CHILD_TIMEOUT) {
thread_add_child(master, check_respawn_thread, NULL,
pid, RESPAWN_TIMER);
return 0;
}
/* We catch a SIGCHLD, handle it */
log_message(LOG_ALERT, "Healthcheck child process(%d) died: Respawning", pid);
start_check_child();
return 0;
}
经过测试得出以下结论·:
测试1: 当realserver超过1100个左右,keepalived的的Healthcheck进程会挂掉,然后不停的重启,主进程及vrrp子进程都无影响
Jul 14 19:47:07 b02 Keepalived_healthcheckers[14203]: Cannot send get request to [10.15.200.200]:80.
Jul 14 19:47:07 b02 Keepalived_healthcheckers[14203]: Removing service [10.100.200.200]:80 from VS [10.15.177.177]:80
Jul 14 19:47:07 b02 Keepalived_healthcheckers[14203]: SMTP connection ERROR to [127.0.0.1]:25.
Jul 14 19:47:07 b02 Keepalived[13055]: Healthcheck child process(14203) died: Respawning
Jul 14 19:47:07 b02 Keepalived[13055]: Starting Healthcheck child process, pid=14525
Jul 14 19:47:07 b02 Keepalived_healthcheckers[14525]: Interface queue is empty
Jul 14 19:47:07 b02 Keepalived_healthcheckers[14525]: No such interface, eth1
Jul 14 19:47:07 b02 Keepalived_healthcheckers[14525]: No such interface, usb0
Jul 14 19:47:07 b02 Keepalived_healthcheckers[14525]: No such interface, bond0
测试2: 当realserver在1020左右 keepalived正常. 父进程及2个子进程都正常. |
|