免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
楼主: 思一克
打印 上一主题 下一主题

关于LINUX上中断在各个CPU之间的负载平衡问题 [复制链接]

论坛徽章:
0
41 [报告]
发表于 2007-07-17 14:21 |只看该作者
irqbalance 没起把?
我的3网卡 2C 的机器基本是 平分的..

[ 本帖最后由 wheel 于 2007-7-17 14:27 编辑 ]

论坛徽章:
0
42 [报告]
发表于 2007-07-17 14:37 |只看该作者
你要是有4个CPU, 2个NIC,irqbalance好象不行。

你看帖子,他们实验结果说不行。
http://linux.chinaunix.net/bbs/v ... p%3Bfilter%3Ddigest

原帖由 wheel 于 2007-7-17 14:21 发表
irqbalance 没起把?
我的3网卡 2C 的机器基本是 平分的..

论坛徽章:
0
43 [报告]
发表于 2007-07-17 14:37 |只看该作者
原帖由 wheel 于 2007-7-17 14:21 发表
irqbalance 没起把?
我的3网卡 2C 的机器基本是 平分的..


应该是和机器以及内核版本有关, 我这边起了irqbalance也一样。

但要说linux的irq routing有这么严重的问题,那也不大可能,肯定大多数机器上是好的。

论坛徽章:
0
44 [报告]
发表于 2007-07-18 13:07 |只看该作者
[root@localhost 83627]# cat /proc/version
Linux version 2.6.22 (cqs@localhost.localdomain) (gcc version 4.1.1 20070105 (Red Hat 4.1.1-53)) #1 SMP Tue Jul 17 21:52:55 CST 2007
又测了下.还是 2加3卡,2.6.22下是ok的.用20的时候就不成了..

论坛徽章:
0
45 [报告]
发表于 2007-07-18 16:00 |只看该作者
原帖由 scutan 于 2007-6-29 18:54 发表



在SMP多处理机上是负载不平衡是比较普遍的一种现象, 我也遇到了. 而且也想了一些办法, 不过还是没有成功.

我以前也发过这样的贴子. 当时我的分析如下:

其实irq_balance()虽然可以平衡多个CPU上面的 ...

在SMP系统初始化后没有那个CPU特殊一些,均是运行在相同的地址空间. INTEL SMP 硬件有什么特别吗,使不同CPU响应中断不同?

论坛徽章:
0
46 [报告]
发表于 2007-07-19 09:53 |只看该作者
http://people.redhat.com/mingo/cfs-scheduler/
重新Make内核加上sched-cfs-v2.6.22.1-v19.patch  就能发现好的多了.

论坛徽章:
0
47 [报告]
发表于 2007-07-19 10:31 |只看该作者
原帖由 AIXHP 于 2007-7-18 16:00 发表

在SMP系统初始化后没有那个CPU特殊一些,均是运行在相同的地址空间. INTEL SMP 硬件有什么特别吗,使不同CPU响应中断不同?


intel的对OS作者的预期,和Linux的具体实现, 很有些不一致的地方。  例如在中断分发这件事上, Intel期望的是, 每个中断都有一个优先级, 计算公式是:

优先级 = 中断向量/16

然而Linux并不使用这个:Linux下的中断没有优先级这个属性。  Intel预期的是, 每个CPU的本地APIC有个TPR寄存器, 其值是当前所运行任务的优先级(每次切换任务时更新), 只有当一个中断的优先级比目标CPU的TPR寄存器值高, 这个中断才能递送到该CPU ── 也就是, 让运行最低优先级任务的CPU来处理中断。   如果有N>1个CPU的TPR相同都是最低, 那么就通过总线仲裁来在它们之间round-robin。 但是Linux没有这么做,理由在我上面给的链接中有说明。

论坛徽章:
0
48 [报告]
发表于 2007-09-17 10:18 |只看该作者
我收回该贴的结论。

慢看来不是我说的原因。还在进一步探索。有结论了告诉大家。

[ 本帖最后由 思一克 于 2007-9-17 11:34 编辑 ]

论坛徽章:
0
49 [报告]
发表于 2007-09-17 10:58 |只看该作者
原帖由 思一克 于 2007-9-17 10:18 发表
我下工夫研究了一周。

得到了很不好的结论:网络底层程序(包括IPTALBES)本质上无法利用SMP

1)用IPRBALANCE。完全可以BALANCE到各CPU上(比如多少时间调整一次),但同一个时刻只能在一个CPU上。CPU的负 ...


如果能根据IP地址,或者TCP端口来balance,是不是会好一些?

论坛徽章:
0
50 [报告]
发表于 2007-09-20 22:08 |只看该作者
PATCH出来了. 是针对2.6.13-15-smp的.

将代码存到文件seeker中, 放在linux source根下, 然后patch -p1 < seeker
然后编译KERNEL.启动.

我初步测试网络下载的速度
在双CPU机器上, 在IPTABLES 的INPUT链安上2400行IP和PORT匹配(目的是故意模拟
高负载的情况). 有PATCH, 网络下载速度比没有可以高出一倍. 因为利用的双CPU.

这个有可能是最好的解决方案. 如果4个CPU,可能将负载能力提高4倍(至少2倍).
IRQBALANCE不需要了.

NAT情况我限于条件,测试的十分不完全.

欢迎测试.

以后我还会给出module, 不用编译KERNEL就可以测试了.





  1. --- old/net/ipv4/ip_input.c 2007-09-20 20:50:31.000000000 +0800
  2. +++ new/net/ipv4/ip_input.c 2007-09-21 05:52:40.000000000 +0800
  3. @@ -362,6 +362,198 @@
  4.          return NET_RX_DROP;
  5. }

  6. +
  7. +#define CONFIG_BOTTOM_SOFTIRQ_SMP
  8. +#define CONFIG_BOTTOM_SOFTIRQ_SMP_SYSCTL
  9. +
  10. +#ifdef CONFIG_BOTTOM_SOFTIRQ_SMP
  11. +
  12. +/*
  13. + *
  14. +Bottom Softirq Implementation. John Ye, 2007.08.27
  15. +
  16. +Why this patch:
  17. +Make kernel be able to concurrently execute softirq's net code on SMP system.
  18. +Takes full advantages of SMP to handle more packets and greatly raises NIC throughput.
  19. +The current kernel's net packet processing logic is:
  20. +1) The CPU which handles a hardirq must be executing its related softirq.
  21. +2) One softirq instance(irqs handled by 1 CPU) can't be executed on more than 2 CPUs
  22. +at the same time.
  23. +The limitation make kernel network be hard to take the advantages of SMP.
  24. +
  25. +How this patch:
  26. +It splits the current softirq code into 2 parts: the cpu-sensitive top half,
  27. +and the cpu-insensitive bottom half, then make bottom half(calld BS) be
  28. +executed on SMP concurrently.
  29. +The two parts are not equal in terms of size and load. Top part has constant code
  30. +size(mainly, in net/core/dev.c and NIC drivers), while bottom part involves
  31. +netfilter(iptables) whose load varies very much. An iptalbes with 1000 rules to match
  32. +will make the bottom part's load be very high. So, if the bottom part softirq
  33. +can be randomly distributed to processors and run concurrently on them, the network will
  34. +gain much more packet handling capacity, network throughput will be be increased
  35. +remarkably.
  36. +
  37. +Where useful:
  38. +It's useful on SMP machines that meet the following 2 conditions:
  39. +1) have high kernel network load, for example, running iptables with thousands of rules, etc).
  40. +2) have more CPUs than active NICs, e.g. a 4 CPUs machine with 2 NICs).
  41. +On these system, with the increase of softirq load, some CPUs will be idle
  42. +while others(number is equal to # of NIC) keeps busy.
  43. +IRQBALANCE will help, but it only shifts IRQ among CPUS, makes no softirq concurrency.
  44. +Balancing the load of each cpus will not remarkably increase network speed.
  45. +
  46. +Where NOT useful:
  47. +If the bottom half of softirq is too small(without running iptables), or the network
  48. +is too idle, BS patch will not be seen to have visible effect. But It has no
  49. +negative affect either.
  50. +User can turn on/off BS functionality by /proc/sys/net/bs_enable switch.
  51. +
  52. +How to test:
  53. +On a linux box, run iptables, add 2000 rules to table filter & table nat to simulate huge
  54. +softirq load. Then, open 20 ftp sessions to download big file. On another machine(who
  55. +use this test machine as gateway), open 20 more ftp download sessions. Compare the speed,
  56. +without BS enabled, and with BS enabled.
  57. +cat /proc/sys/net/bs_enable. this is a switch to turn on/off BS
  58. +cat /proc/sys/net/bs_status. this shows the usage of each CPUs
  59. +Test shown that when bottom softirq load is high, the network throughput can be nearly
  60. +doubled on 2 CPUs machine. hopefully it may be quadrupled on a 4 cpus linux box.
  61. +
  62. +Bugs:
  63. +It will NOT allow hotpug CPU.
  64. +It only allows incremental CPUs ids, starting from 0 to num_online_cpus().
  65. +for example, 0,1,2,3 is OK. 0,1,8,9 is KO.
  66. +
  67. +Some considerations in the future:
  68. +1) With BS patch, the irq balance code on arch/i386/kernel/io_apic.c seems no need any more,
  69. +at least not for network irq.
  70. +2) Softirq load will become very small. It only run the top half of old softirq, which
  71. +is much less expensive than bottom half---the netfilter program.
  72. +To let top softirq process more packets, cant these 3 network parameters be enlarged?
  73. +extern int netdev_max_backlog = 1000;
  74. +extern int netdev_budget = 300;
  75. +extern int weight_p = 64;
  76. +3) Now, BS are running on built-in keventd thread, we can create new workqueues to let it run on?
  77. +
  78. +Signed-off-by: John Ye (Seeker) <[email]johny@webizmail.com[/email]>
  79. + *
  80. + */
  81. +
  82. +#define BS_USE_PERCPU_DATA
  83. +
  84. +struct cpu_stat {
  85. + unsigned long irqs; //total irqs
  86. + unsigned long dids; //I did,
  87. + unsigned long others;
  88. + unsigned long works;
  89. +};
  90. +#define BS_CPU_STAT_DEFINED
  91. +
  92. +static int nr_cpus = 0;
  93. +
  94. +#ifdef BS_USE_PERCPU_DATA
  95. +static DEFINE_PER_CPU(struct sk_buff_head, bs_cpu_queues); // cacheline_aligned_in_smp;
  96. +static DEFINE_PER_CPU(struct work_struct, bs_works);
  97. +struct cpu_stat bs_cpu_status[NR_CPUS];
  98. +#else
  99. +#define NR_CPUS  8
  100. +static struct sk_buff_head bs_cpu_queues[NR_CPUS];
  101. +static struct work_struct  bs_works[NR_CPUS];
  102. +static struct cpu_stat    bs_cpu_status[NR_CPUS];
  103. +#endif
  104. +
  105. +int bs_enable = 1;
  106. +
  107. +static int ip_rcv1(struct sk_buff *skb, struct net_device *dev)
  108. +{
  109. + return NF_HOOK_COND(PF_INET, NF_IP_PRE_ROUTING, skb, dev, NULL, ip_rcv_finish, nf_hook_input_cond(skb));
  110. +}
  111. +
  112. +
  113. +static void bs_func(void *data)
  114. +{
  115. + int  flags, num, cpu;
  116. + struct sk_buff *skb, *last;
  117. + struct work_struct *bs_works;
  118. + struct sk_buff_head *q;
  119. + cpu = smp_processor_id();
  120. +
  121. +
  122. +#ifdef BS_USE_PERCPU_DATA
  123. + bs_works = &per_cpu(bs_works, cpu);
  124. + q = &per_cpu(bs_cpu_queues, cpu);
  125. +#else
  126. + bs_works = &bs_works[cpu];
  127. + q = &bs_cpu_queues[cpu];
  128. +#endif
  129. +
  130. + local_bh_disable();
  131. +restart:
  132. + num = 0;
  133. + while(1) {
  134. + last = skb;
  135. + spin_lock_irqsave(&q->lock, flags);
  136. +         skb = __skb_dequeue(q);
  137. + spin_unlock_irqrestore(&q->lock, flags);
  138. + if(!skb) break;
  139. + num++;
  140. + //local_bh_disable();
  141. +         ip_rcv1(skb, skb->dev);
  142. + //__local_bh_enable(); //sub_preempt_count(SOFTIRQ_OFFSET - 1);
  143. + }
  144. +
  145. + bs_cpu_status[cpu].others += num;
  146. + if(num > 0) { goto restart; }
  147. +
  148. + __local_bh_enable(); //sub_preempt_count(SOFTIRQ_OFFSET - 1);
  149. + bs_works->func = 0;
  150. +
  151. + return;
  152. +}
  153. +
  154. +/* COPY_IN_START_FROM kernel/workqueue.c */
  155. +struct cpu_workqueue_struct {
  156. +
  157. + spinlock_t lock;
  158. +
  159. + long remove_sequence; /* Least-recently added (next to run) */
  160. + long insert_sequence; /* Next to add */
  161. +
  162. + struct list_head worklist;
  163. + wait_queue_head_t more_work;
  164. + wait_queue_head_t work_done;
  165. +
  166. + struct workqueue_struct *wq;
  167. + task_t *thread;
  168. +
  169. + int run_depth; /* Detect run_workqueue() recursion depth */
  170. +} ____cacheline_aligned;
  171. +
  172. +
  173. +struct workqueue_struct {
  174. + struct cpu_workqueue_struct cpu_wq[NR_CPUS];
  175. + const char *name;
  176. + struct list_head list; /* Empty if single thread */
  177. +};
  178. +/* COPY_IN_END_FROM kernel/worqueue.c */
  179. +
  180. +extern struct workqueue_struct *keventd_wq;
  181. +
  182. +/* Preempt must be disabled. */
  183. +static void __queue_work(struct cpu_workqueue_struct *cwq,
  184. + struct work_struct *work)
  185. +{
  186. + unsigned long flags;
  187. +
  188. + spin_lock_irqsave(&cwq->lock, flags);
  189. + work->wq_data = cwq;
  190. + list_add_tail(&work->entry, &cwq->worklist);
  191. + cwq->insert_sequence++;
  192. + wake_up(&cwq->more_work);
  193. + spin_unlock_irqrestore(&cwq->lock, flags);
  194. +}
  195. +#endif //CONFIG_BOTTOM_SOFTIRQ_SMP
  196. +
  197. +
  198. /*
  199.   * Main IP Receive routine.
  200.   */
  201. @@ -424,8 +616,73 @@
  202.   }
  203.   }

  204. +#ifdef CONFIG_BOTTOM_SOFTIRQ_SMP
  205. + if(!nr_cpus)
  206. + nr_cpus = num_online_cpus();
  207. +
  208. +    if(bs_enable && nr_cpus > 1 && iph->protocol != IPPROTO_ICMP) {
  209. +    //if(bs_enable && iph->protocol == IPPROTO_ICMP) { //test on icmp first
  210. + unsigned int flags, cur, cpu;
  211. + struct work_struct *bs_works;
  212. + struct sk_buff_head *q;
  213. +
  214. + cur = smp_processor_id();
  215. +
  216. + bs_cpu_status[cur].irqs++;
  217. +
  218. + //random distribute
  219. + cpu = (bs_cpu_status[cur].irqs % nr_cpus);
  220. + if(cpu == cur) {
  221. + bs_cpu_status[cpu].dids++;
  222. + return ip_rcv1(skb, dev);
  223. + }
  224. +
  225. +#ifdef BS_USE_PERCPU_DATA
  226. + q = &per_cpu(bs_cpu_queues, cpu);
  227. +#else
  228. + q = &bs_cpu_queues[cpu];
  229. +#endif
  230. +
  231. + if(!q->next) { // || skb_queue_len(q) == 0 ) {
  232. + skb_queue_head_init(q);
  233. + }
  234. +
  235. +
  236. +#ifdef BS_USE_PERCPU_DATA
  237. + bs_works = &per_cpu(bs_works, cpu);
  238. +#else
  239. + bs_works = &bs_works[cpu];
  240. +#endif
  241. + /*
  242. +        local_irq_save(flags);
  243. + SKB_CB(skb)->dev = dev;
  244. + SKB_CB(skb)->ptype = pt;
  245. + */
  246. +        spin_lock_irqsave(&q->lock, flags);
  247. + __skb_queue_tail(q, skb);
  248. +        spin_unlock_irqrestore(&q->lock, flags);
  249. + //if(net_ratelimit()) printk("qlen %d\n", q->qlen);
  250. +        
  251. + //local_irq_restore(flags);
  252. +           if (!bs_works->func) {
  253. +       INIT_WORK(bs_works, bs_func, q);
  254. + bs_cpu_status[cpu].works++;
  255. + preempt_disable();
  256. + __queue_work(keventd_wq->cpu_wq + cpu, bs_works);
  257. + preempt_enable();
  258. + }
  259. + } else {
  260. + int cpu = smp_processor_id();
  261. + bs_cpu_status[cpu].irqs++;
  262. + bs_cpu_status[cpu].dids++;
  263. + return ip_rcv1(skb, dev);
  264. + }
  265. + return 0;
  266. +#else
  267.   return NF_HOOK_COND(PF_INET, NF_IP_PRE_ROUTING, skb, dev, NULL,
  268. -                     ip_rcv_finish, nf_hook_input_cond(skb));
  269. +            ip_rcv_finish, nf_hook_input_cond(skb));
  270. +#endif //CONFIG_BOTTOM_SOFTIRQ_SMP
  271. +

  272. inhdr_error:
  273.   IP_INC_STATS_BH(IPSTATS_MIB_INHDRERRORS);
  274. --- old/net/sysctl_net.c 2007-09-20 23:30:29.000000000 +0800
  275. +++ new/net/sysctl_net.c 2007-09-20 23:28:06.000000000 +0800
  276. @@ -30,6 +30,22 @@
  277. extern struct ctl_table tr_table[];
  278. #endif

  279. +
  280. +#define CONFIG_BOTTOM_SOFTIRQ_SMP_SYSCTL
  281. +#ifdef CONFIG_BOTTOM_SOFTIRQ_SMP_SYSCTL
  282. +#if !defined(BS_CPU_STAT_DEFINED)
  283. +struct cpu_stat {
  284. + unsigned long irqs; //total irqs
  285. + unsigned long dids; //I did,
  286. + unsigned long others;
  287. + unsigned long works;
  288. +};
  289. +#endif
  290. +extern struct cpu_stat bs_cpu_status[NR_CPUS];
  291. +
  292. +extern int bs_enable;
  293. +#endif
  294. +
  295. struct ctl_table net_table[] = {
  296.   {
  297.   .ctl_name = NET_CORE,
  298. @@ -61,5 +77,26 @@
  299.   .child = tr_table,
  300.   },
  301. #endif
  302. +
  303. +#ifdef CONFIG_BOTTOM_SOFTIRQ_SMP_SYSCTL
  304. + {
  305. + .ctl_name = 99,
  306. + .procname = "bs_status",
  307. + .data = &bs_cpu_status,
  308. + .maxlen = sizeof(bs_cpu_status),
  309. + .mode = 0644,
  310. + .proc_handler = &proc_dointvec,
  311. + },
  312. +
  313. + {
  314. + .ctl_name = 99,
  315. + .procname = "bs_enable",
  316. + .data = &bs_enable,
  317. + .maxlen = sizeof(int),
  318. + .mode = 0644,
  319. + .proc_handler = &proc_dointvec,
  320. + },
  321. +#endif
  322. +
  323.   { 0 },
  324. };
  325. --- old/kernel/workqueue.c 2007-09-21 04:48:13.000000000 +0800
  326. +++ new/kernel/workqueue.c 2007-09-21 04:47:49.000000000 +0800
  327. @@ -384,7 +384,11 @@
  328.   kfree(wq);
  329. }

  330. +/*
  331. static struct workqueue_struct *keventd_wq;
  332. +*/
  333. +struct workqueue_struct *keventd_wq;
  334. +EXPORT_SYMBOL(keventd_wq);

  335. int fastcall schedule_work(struct work_struct *work)
  336. {


复制代码

[ 本帖最后由 思一克 于 2007-9-20 22:17 编辑 ]
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP