Chinaunix

标题: 网卡NAPI收包为何不放在硬中断中？ [打印本页]

作者: mordorwww 时间: 2013-04-13 22:09
标题: 网卡NAPI收包为何不放在硬中断中？
中断模式收包放在在硬中断中，使用 backlog队列来缓存
NAPI模式收包缺放在在软中断中，不再使用本CPU的队列来缓存。收一个包处理一个包
NAPI为何不也在硬中断中轮询收包，收的包都缓存在队列，收完后再处理

作者: lenky0401 时间: 2013-04-14 10:33
我记得NAPI就是为了使用轮询来减少中断数，提高效率。
硬中断要简、短、快，同样的工作（比如轮询收包）能放在下半部的尽量放在下半部。

作者: mordorwww 时间: 2013-04-14 12:37

lenky0401 发表于 2013-04-14 10:33
我记得NAPI就是为了使用轮询来减少中断数，提高效率。
硬中断要简、短、快，同样的工作（比如轮询收包）能 ...

硬中断里多收几个包烧不了几个cpu cycle 吧

一般的网卡都是 DMA ring buffer 队列，如果buffer里本来就有几个包，难道硬中断就只收一个包，对其它的包视而不见？

作者: junnyg 时间: 2013-04-14 12:59

mordorwww 发表于 2013-04-14 12:37
硬中断里多收几个包烧不了几个cpu cycle 吧

一般的网卡都是 DMA ring buffer 队列，如果buffer里本 ...

关键是napi模式下队列上没有包了也得等着，直到配额完成（达到配额时间或收到配额量的包），这个过程是比较耗时的，放在硬中断中不合适，而如果减少配额则poll变的没有意义，无法达到减少网卡中断的目的

作者: mordorwww 时间: 2013-04-14 14:57

junnyg 发表于 2013-04-14 12:59
关键是napi模式下队列上没有包了也得等着，直到配额完成（达到配额时间或收到配额量的包），这个过程是比 ...

不至于吧，没包还把CPU霸占着？

作者: junnyg 时间: 2013-04-14 15:26
回复 5# mordorwww
退出napi轮询状态有以下三个条件：
1. 处理时间超期
2. budget用完
3. 网卡收到的包少于预期收到的包，每次poll的时候会指定收一定数量的包，如果这次poll得到的包个数小于这个值，就打开网卡中断，停止napi轮询

我刚的说法有误，无包的时候确实会停止，但如果持续有包，会一直poll到budget用完或超时
static void net_rx_action(struct softirq_action *h)
{
struct list_head *list = &__get_cpu_var(softnet_data).poll_list;
unsigned long time_limit = jiffies + 2;
int budget = netdev_budget;
void *have;

local_irq_disable();

while (!list_empty(list)) { /*在这儿死循环，网卡注册的poll函数里会调用napi_complete将其从pool_list里删除*/
struct napi_struct *n;
int work, weight;

/* If softirq window is exhuasted then punt.
   * Allow this to run for 2 jiffies since which will allow
   * an average latency of 1.5/HZ.
   */
if (unlikely(budget <= 0 || time_after(jiffies, time_limit)))/*这里判断是否超时或者用完budget，如果是，则poll结束*/
goto softnet_break;

local_irq_enable();

/* Even though interrupts have been re-enabled, this
   * access is safe because interrupts can only add new
   * entries to the tail of this list, and only ->poll()
   * calls can remove this head entry from the list.
   */
n = list_first_entry(list, struct napi_struct, poll_list);

have = netpoll_poll_lock(n);

weight = n->weight;

/* This NAPI_STATE_SCHED test is for avoiding a race
   * with netpoll's poll_napi().  Only the entity which
   * obtains the lock and sees NAPI_STATE_SCHED set will
   * actually make the ->poll() call.  Therefore we avoid
   * accidently calling ->poll() when NAPI is not scheduled.
   */
work = 0;
if (test_bit(NAPI_STATE_SCHED, &n->state)) {
work = n->poll(n, weight); /*这里调用网卡注册的poll函数进行收包*/
trace_napi_poll(n);
}

WARN_ON_ONCE(work > weight);

budget -= work; /* budget减去本次收到的包*/

local_irq_disable();

/* Drivers must not modify the NAPI state if they
   * consume the entire weight.  In such cases this code
   * still "owns" the NAPI instance and therefore can
   * move the instance around on the list at-will.
   */
if (unlikely(work == weight)) {
if (unlikely(napi_disable_pending(n))) {
local_irq_enable();
napi_complete(n);
local_irq_disable();
} else
list_move_tail(&n->poll_list, list);
}

netpoll_poll_unlock(have);
}
out:
local_irq_enable();

#ifdef CONFIG_NET_DMA
/*
   * There may not be any more sk_buffs coming right now, so push
   * any pending DMA copies to hardware
   */
dma_issue_pending_all();
#endif

return;

softnet_break:
__get_cpu_var(netdev_rx_stat).time_squeeze++;
__raise_softirq_irqoff(NET_RX_SOFTIRQ);
goto out;
}

以intel的82599网卡为例：
static int ixgbe_poll(struct napi_struct *napi, int budget)
{
struct ixgbe_q_vector *q_vector =
                        container_of(napi, struct ixgbe_q_vector, napi);
struct ixgbe_adapter *adapter = q_vector->adapter;
int tx_clean_complete, work_done = 0;

#ifdef CONFIG_IXGBE_DCA
if (adapter->flags & IXGBE_FLAG_DCA_ENABLED) {
ixgbe_update_tx_dca(adapter, adapter->tx_ring[0]);
ixgbe_update_rx_dca(adapter, adapter->rx_ring[0]);
}
#endif

tx_clean_complete = ixgbe_clean_tx_irq(q_vector, adapter->tx_ring[0]);
ixgbe_clean_rx_irq(q_vector, adapter->rx_ring[0], &work_done, budget);

if (!tx_clean_complete)
work_done = budget;

/* If budget not fully consumed, exit the polling mode */
if (work_done < budget) { /* 这里网卡如果没有收到预期的包个数，则会打开网卡中断，停止napi轮询 */
napi_complete(napi);
if (adapter->rx_itr_setting & 1)
ixgbe_set_itr(adapter);
if (!test_bit(__IXGBE_DOWN, &adapter->state))
ixgbe_irq_enable_queues(adapter, IXGBE_EIMS_RTX_QUEUE);
}
return work_done;
}

作者: kudakitsune 时间: 2013-04-18 03:28
回复 1# mordorwww

第一个原因，收包可能会涉及到对ring重填，可能要分配内存
第二个原因，可能因为实际处理能力不足，收到的包最后还是要丢掉。浪费时间
第三个原因，按你的方法，最后处理方式还是没有变化，白添了操作。浪费时间

作者: mordorwww 时间: 2013-04-18 09:16

kudakitsune 发表于 2013-04-18 03:28
回复 1# mordorwww

第一个原因，收包可能会涉及到对ring重填，可能要分配内存

第一个原因，收包可能会涉及到对ring重填，可能要分配内存
                     有什么问题？
                           如果不用NAPI，事实上还是硬中断收包，硬中断仍然要做这些情
第二个原因，可能因为实际处理能力不足，收到的包最后还是要丢掉。浪费时间
                     浪费的只是收包，收包动作本省就是非常轻量的。
                     包送到了进程，可能因为实际处理能力不足，进程处理不了，仍然要丢掉。仍然要浪费时间

第三个原因，按你的方法，最后处理方式还是没有变化，白添了操作。浪费时间
                     没看明白问题在哪里

作者: smalloc 时间: 2013-04-18 16:14
回复 4# junnyg

timeout

作者: mordorwww 时间: 2013-12-08 00:20
木有答案，顶上来

作者: smalloc 时间: 2013-12-08 21:22
回复 8# mordorwww

这不是NAPI的问题，实际上不少网卡有接收ring也没有使用NAPI。
但是很少有网卡驱动硬中断中做新配DMA的操作。都是软中断中处理的。
估计还是2楼说的原因。如果单独享CPU问题不大，但是设备多了硬中断需求多了，如果不及时响应，确实可能在设备上丢失中断状态

作者: mordorwww 时间: 2013-12-09 08:53

smalloc 发表于 2013-12-08 21:22
回复 8# mordorwww

填ring buffer才是响应设备吧，
如果 ring buffer满了，设备就直接丢包了，这个才要快啊

作者: kkddkkdd11 时间: 2013-12-10 10:29
打个比方
我们去某地方办事
都是先领号，再等待窗口办事
多个窗口代表多个cpu
为啥不直接领号的时候，直接把事干了
道理和napi 不在硬中断里边做太多事是一样的：）

作者: mordorwww 时间: 2013-12-10 12:13

kkddkkdd11 发表于 2013-12-10 10:29
打个比方
我们去某地方办事
都是先领号，再等待窗口办事

收包和硬件密切相关，收包也比较快，收包相当于领号
包处理相当于窗口办事

硬中断里不收包，这样硬中断似乎啥也木有干

作者: kkddkkdd11 时间: 2013-12-10 15:28

mordorwww 发表于 2013-12-10 12:13
收包和硬件密切相关，收包也比较快，收包相当于领号
包处理相当于窗口办事

我对并行能力强的fpga的设计原理，也不太清楚，
不过“感觉”对于网卡来说，硬中断产生的时候，包已经被网卡收好了
网卡触发硬中断，只是一个告知cpu可以读数据了
cpu不收数据过期就丢弃了

还是这个比方：
去办事，手续必须已经在兜里准备好了
才能去，排号机上拿号（硬中断）
而软中断是办事窗口
实际的包处理，已经进了协议栈，就脱离了软中断了

这块可能更多还算linux内核自己的分层机制
所有处理都放硬中断里边，肯定没有问题的，说不定处理起来更快
打个比方，如果操作系统都在内核级跑ring3程序，完全可以啊
对于，扩展性来说，就不友好了
罢了

欢迎光临 Chinaunix (http://bbs.chinaunix.net/)