squid配置参数round-robin严重不均衡的讨论

youzhengchuan 发表于 2012-08-18 10:25

在《squid权威文档中》，“round-robin”选项的解释如下：
该选项是简单的负载共享技术。仅仅当你指定了2 个或多个父cache 作为轮转时，它才有用。squid 对每个父cache 维持一个计数器。当需要转发cache 丢失时，squid 选择计数器值最低的父cache。

在我的squid的如下配置中，有2个parent，一个命名为ha，一个命名为hb，然后通过“round-robin”对2个parent进行简单轮询。
################### round-robin #####################
cache_peer source1.parent.com parent 80 3100 no-digest no-query no-netdb-exchange originserver name=ha round-robin
cache_peer source2.parent.com parent 80 3100 no-digest no-query no-netdb-exchange originserver name=hb round-robin
cache_peer_domain ha flv.domain.com
cache_peer_domain hb flv.domain.com

按照“round-robin”的描述，过一段时间后，2个parent的请求应该基本一致，但是过了大概一天，统计的请求数如下：
5 TCP_DENIED NONE -
203 TCP_HIT NONE -
22 TCP_MISS ROUNDROBIN_PARENT ha
212 TCP_MISS ROUNDROBIN_PARENT hb
因为请求在本地HIT，所以不会再通过parent，所以去掉HIT的请求不管。Miss的请求需要通过parent，但是发现2个parent的轮询到的比例严重失调，hb的请求是ha的一百多倍，实在太奇怪了，而实际上2个parent在同一个网段，网络质量是一样的，不知道是什么原因会导致这样的情况。

youzhengchuan 发表于 2012-08-18 12:07

另外，在测试round-robin的过程中，我顺便测试了parent的dead状态判断，发现squid在判断parent是否存活的流程有意思，整理如下：
《关于squid判断parent的Dead或者live状态》

当配置有一个多个parent的情况下，如果其中有一个parent连接不上，被判断为dead状态的时候，squid会在cache.log中记录类似如下日志：
2012/08/18 11:26:08| Detected DEAD Parent: source2.parent.com
当处于dead状态的parent可以连接上了之后，会再记录这样的日志；
2012/08/18 11:27:51| Detected REVIVED Parent: source2.parent.com

那么，squid是在判断parent到底是dead还是live的过程中，都做了一些什么呢，如下是一些测试结果。

配置一：只有一个parent的情况下，每次请求连接失败都会记录日志，连续10次连接失败，该parent则会设置为dead状态，以后每次请求失败都不会再记录，直到恢复：
2012/08/18 11:40:48| TCP connection to ctsource.parent.com (ctsource.parent.com:80) failed
2012/08/18 11:41:40| TCP connection to ctsource.parent.com (ctsource.parent.com:80) failed
2012/08/18 11:43:46| TCP connection to ctsource.parent.com (ctsource.parent.com:80) failed
2012/08/18 11:43:46| TCP connection to ctsource.parent.com (ctsource.parent.com:80) failed
2012/08/18 11:43:46| TCP connection to ctsource.parent.com (ctsource.parent.com:80) failed
2012/08/18 11:43:46| TCP connection to ctsource.parent.com (ctsource.parent.com:80) failed
2012/08/18 11:43:46| TCP connection to ctsource.parent.com (ctsource.parent.com:80) failed
2012/08/18 11:43:46| TCP connection to ctsource.parent.com (ctsource.parent.com:80) failed
2012/08/18 11:43:46| TCP connection to ctsource.parent.com (ctsource.parent.com:80) failed
2012/08/18 11:44:24| TCP connection to ctsource.parent.com (ctsource.parent.com:80) failed
2012/08/18 11:44:24| Detected DEAD Parent: ctsource.parent.com

配置二：在有多个Parent的情况下。
1）如果配置了round-robin，其中有一个parent断掉，则会记录日志，然后进入失败流程。在连续失败10次后，该parent记录为dead状态，并从"round-robin"算法中排除掉。
2）如果没有配置round-robin，则会按照顺序请求，如果第一顺序parent断掉，会记录日志，然后进入失败流程。连续失败10次，该parent记录为dead状态，之后的请求被发送到第二个parent。

youzhengchuan 发表于 2012-08-18 12:07

期待关于round-robin不均衡的回复

cyent 发表于 2013-01-07 20:34

哥们，我也出现过类似情况，我也有用round-robin ，每次情况差不多都是这样：第一个cache_peer一开始的量很高，过几分钟，量就下去了，然后第二个cache_peer就承担着主力军的角色，第三个cache_peer的量处于第1个和第2个中间。

deltatang 发表于 2013-08-19 22:09

看一下squid的源码比较容易明白

peer_select.cc
里面的
peerGetSomeParent

rr的实现策略在这一行
} else if ((p = getRoundRobinParent(request))) {
   code = ROUNDROBIN_PARENT;

这里调用了：getRoundRobinParent
实现如下： rr计数器以及权值都会有所影响
         if (p->weight == q->weight) {
            if (q->rr_count < p->rr_count)
               continue;
         } else if ( ((double) q->rr_count / q->weight) < ((double) p->rr_count / p->weight)) {
            continue;
         }

页: [1]

Chinaunix's Archiver

squid配置参数round-robin严重不均衡的讨论