免费注册	查看新帖 \|


平台论坛博客文库

› 论坛 › 程序设计 › 内核源码 › Kernel Bug-Vulnerability-Comment library

1 ... 7 8 9 10 111213 14 15 / 15 页下一页

最近访问板块

发新帖

楼主: sisi8408

上一主题

下一主题

Kernel Bug-Vulnerability-Comment library [复制链接]

论坛徽章:: 0

111楼 [报告]

发表于 2008-05-11 00:29 |只看该作者

/*
* 2.6.24.4-rt4
*/
+static int ksoftirqd(void * __data)
+{
+ struct sched_param param = { .sched_priority = MAX_USER_RT_PRIO/2 };
+ struct softirqdata *data = __data;
+ u32 softirq_mask = (1 << data->nr);
+ struct softirq_action *h;
+ int cpu = data->cpu;
+
+#ifdef CONFIG_PREEMPT_SOFTIRQS
+ init_waitqueue_head(&data->wait);
+#endif
+
+ sys_sched_setscheduler(current->pid, SCHED_FIFO, &param);
+ current->flags |= PF_SOFTIRQ;
set_current_state(TASK_INTERRUPTIBLE);

复制代码

/*
* 2.6.24.4
*/
static void task_tick_rt(struct rq *rq, struct task_struct *p)
{
update_curr_rt(rq);
/*
* RR tasks need a special form of timeslice management.
* FIFO tasks have no timeslices.
*/
if (p->policy != SCHED_RR)
return;
if (--p->time_slice)
return;
p->time_slice = DEF_TIMESLICE;
/*
* Requeue to the end of queue if we are not the only element
* on the queue:
*/
if (p->run_list.prev != p->run_list.next) {
requeue_task_rt(rq, p);
set_tsk_need_resched(p);
}
}

复制代码

if ksoftirqd is RT & SCHED_FIFO, provided that FIFO tasks have no timeslices,
it is possible in many cases even in RT version that NET_TX_SOFTIRQ,
NET_RX_SOFTIRQ and others mask each other, which may result in unnormal behaviors,
say slab cache crash.

[ 本帖最后由 sisi8408 于 2008-5-11 00:30 编辑 ]

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

论坛徽章:: 0

112楼 [报告]

发表于 2008-05-17 10:05 |只看该作者

/* 2.6.24.4
* static void rebalance_domains(int cpu, enum cpu_idle_type idle)
*/
- if (interval > HZ * NR_CPUS /10)
- interval = HZ * NR_CPUS /10;
+ if (interval > (HZ /10) * num_online_cpus)
+ interval = (HZ /10) * num_online_cpus;

复制代码

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

论坛徽章:: 0

113楼 [报告]

发表于 2008-05-18 00:34 |只看该作者

/** 2.6.24.4
static int __assign_irq_vector(int irq, cpumask_t mask)
*/
if (unlikely(current_vector == vector))
- continue;
+ goto next;

复制代码

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

论坛徽章:: 0

114楼 [报告]

发表于 2008-06-08 00:06 |只看该作者

/*
* from 独孤九贱大侠的帖子
*/
Unable to handle kernel paging request at virtual address 713401b6
printing eip: *pde = 00000000
Oops: 0000 [#1]
SMP
Modules linked in: uflux e1000 trusthost
CPU: 0
EIP: 0060:[<c03c1de6>] Not tainted VLI
EFLAGS: 00010206 (2.6.12)
EIP is at ip_route_input+0x86/0x1d0
eax: df976fc0 ebx: c0629000 ecx: 7134010a edx: 00000000
esi: 0121010a edi: 2495313a ebp: 00000003 esp: c0629f04
ds: 007b es: 007b ss: 0068
Process swapper (pid: 0, threadinfo=c0629000 task=c0508c20)
Stack: 0121010a 2495315a 00000000 00000000 df987800 00000000 df987800 dd0b6400
00000000 006d8580 00000000 dd0b6400 dd14c620 c03c4fc0 c03c4eff dd0b6400
0121010a 2495313a 00000000 df987800 c03c4fc0 80000000 dd18d440 dc7ef240
Call Trace:
[<c03c4fc0>] ip_rcv_finish+0x0/0x310
[<c03c4eff>] ip_rcv+0x4ef/0x5b0
[<c03c4fc0>] ip_rcv_finish+0x0/0x310
[<c039a771>] netif_receive_skb+0x1e1/0x280
[<c039a8a3>] process_backlog+0x93/0x130
[<c039a9ef>] net_rx_action+0xaf/0x1a0
[<c0129942>] __do_softirq+0x72/0xe0
[<c0106bcb>] do_softirq+0x5b/0x60
/* ------------------------------------------------ */
39c0: a1 00 00 00 00 mov 0x0,%eax
39c5: 8b 53 10 mov 0x10(%ebx),%edx
39c8: f7 d0 not %eax
39ca: 8b 04 90 mov (%eax,%edx,4),%eax
39cd: ff 40 38 incl 0x38(%eax)
39d0: 8b 09 mov (%ecx),%ecx
39d2: 85 c9 test %ecx,%ecx
39d4: 74 7a je 3a50 <ip_route_input+0x100>
39d6: 39 b1 ac 00 00 00 cmp %esi,0xac(%ecx)

复制代码

1, EIP is at 39d6, as shown the original poster,
and it is clear that the rt entry is still in hash table,
otherwise the reader of rt hash table has no chance to see it.
And there are at most 3 chances of hard irq, which will block
the reader, from 39d0 to 39d6.
2, From call trace, the reader is in context of soft irq,
and the NAPI is not active in the kernel used.
3, Since non-NAPI, what a pity, the NIC ISR is longer and completes when
netif_rx() returns.
4, It is assumed that hardware works fine, anyway.
An updater on another cpu, yeah, (if no ohter cpu oops has no chance to play
its game, just because oops occurs in soft irq contxt,)
deletes the rt entry in %ecx in 39d0,
after happenly the reader is blocked by hard irq, say new skb is coming.
5, Of high probablity, the updater is from timer hard irq on another cpu,
/* 2.6.24.4
*
* Called from the timer interrupt handler,
* to charge one tick to the current process.
*
* user_tick is 1 if the tick is user time, 0 for system.
*/
void update_process_times (int user_tick)
{
struct task_struct *p = current;
int cpu = smp_processor_id();
account_process_tick(p, user_tick);
run_local_timers();
if (rcu_pending(cpu))
rcu_check_callbacks(cpu, user_tick);
scheduler_tick();
run_posix_cpu_timers(p);
}
rt gc timer, execed by run_local_timers(),
happenly deletes the rt entry in %ecx in 39d0,
which is already seen by the reader and is not NULL,
and delivers it to rcu core by calling call_rcu_bh().
In rcu_pending(), there are 4 chances that
fire rcu_check_callbacks(),
in which tasklet_schedule(&per_cpu(rcu_tasklet, cpu)) is called in anyway.
6, Though rcu_tasklet is registered,
/*
* This does the RCU processing work from tasklet context.
*/
static void __rcu_process_callbacks(struct rcu_ctrlblk *rcp,
struct rcu_data *rdp)
{
if (rdp->curlist &&
!rcu_batch_before(rcp->completed, rdp->batch)) {
*rdp->donetail = rdp->curlist;
rdp->donetail = rdp->curtail;
/*
* rcp->completed >= rdp->batch
*/
rdp->curlist = NULL;
rdp->curtail = &rdp->curlist;
}
if (rdp->nxtlist && !rdp->curlist) {
local_irq_disable();
rdp->curlist = rdp->nxtlist;
rdp->curtail = rdp->nxttail;
rdp->nxtlist = NULL;
rdp->nxttail = &rdp->nxtlist;
local_irq_enable();
/*
* start the next batch of callbacks
*/
/* determine batch number */
rdp->batch = rcp->cur + 1;
/* see the comment and corresponding wmb() in
* the rcu_start_batch()
*/
smp_rmb();
if (!rcp->next_pending) {
/* and start it/schedule start if it's a new batch */
spin_lock(&rcp->lock);
rcp->next_pending = 1;
rcu_start_batch(rcp);
spin_unlock(&rcp->lock);
}
}
rcu_check_quiescent_state(rcp, rdp);
if (rdp->donelist)
rcu_do_batch(rdp);
}
call_rcu_bh() only add the victim to rdp->nxtlist,
which will not be freed in the first exec of rcu_tasklet after
timer hard irq, but maybe freed in the following execs of rcu_tasklet
after the same timer hard irq becaude soft irqs are allowed more than once,
if certain conditions match:
I, if (rdp->nxtlist && !rdp->curlist) is true,
when the first exec of rcu_tasklet,
rdp->curlist = rdp->nxtlist;
II, if (rdp->nxtlist && !rdp->curlist) is true,
also when the first exec of rcu_tasklet,
III, rcu_bh_qsctr_inc(cpu) already called in rcu_check_callbacks()
IV, cpu_quiet() called by rcu_check_quiescent_state()
when the first exec of rcu_tasklet,
V, if (rdp->curlist &&
!rcu_batch_before(rcp->completed, rdp->batch))
is true,
when the second exec of rcu_tasklet,
VI, if the reader is blocked longer enough

复制代码

[ 本帖最后由 sisi8408 于 2008-6-8 00:08 编辑 ]

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

论坛徽章:: 0

115楼 [报告]

发表于 2008-07-18 21:48 |只看该作者

...
schedstat_inc(rq, ttwu_count);
if (cpu == this_cpu) {
schedstat_inc(rq, ttwu_local);
goto out_set_cpu;
}
/* try_to_wake_up @ 2.6.24.4
if (unlikely(!cpu_isset(this_cpu, p->cpus_allowed)))
goto out_set_cpu;
*/
for_each_domain(this_cpu, sd) {
if (cpu_isset(cpu, sd->span)) {
schedstat_inc(sd, ttwu_wake_remote);
this_sd = sd;
break;
}
}
if (unlikely(!cpu_isset(this_cpu, p->cpus_allowed)))
goto out_set_cpu;
/*
* Check for affine wakeup and passive balancing possibilities.
*/
...

复制代码

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

论坛徽章:: 0

116楼 [报告]

发表于 2008-07-18 22:38 |只看该作者

...
/* build_sched_domains @ 2.6.24.4
* Calculate CPU power for physical packages and nodes
*/
#ifdef CONFIG_SCHED_SMT
for_each_cpu_mask(i, *cpu_map) {
struct sched_domain *sd = &per_cpu(cpu_domains, i);
init_sched_groups_power(i, sd);
}
#elif defined(CONFIG_SCHED_MC)
for_each_cpu_mask(i, *cpu_map) {
struct sched_domain *sd = &per_cpu(core_domains, i);
init_sched_groups_power(i, sd);
}
#else
for_each_cpu_mask(i, *cpu_map) {
struct sched_domain *sd = &per_cpu(phys_domains, i);
init_sched_groups_power(i, sd);
}
#endif
/* Attach the domains */
...

复制代码

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

论坛徽章:: 0

117楼 [报告]

发表于 2008-07-27 11:39 |只看该作者

queue & sched in .26 blk layer

1, blk init core
=============
int __init blk_dev_init(void)
{
int i;
/* 1,
* ops upon queue is based on `work',
* it is asyn and in kthead context, goto see work queue,
* like do_softirq and ksoftirqd/x
*/
kblockd_workqueue = create_workqueue("kblockd");
if (!kblockd_workqueue)
panic("Failed to create kblockd\n");
request_cachep = kmem_cache_create("blkdev_requests",
sizeof(struct request), 0, SLAB_PANIC, NULL);
/* 2,
* like at other spots, kmem cache utilised for skb in net,
* if u like to do something funny for skb, say 0-copy,
* u have to maintain alloc/free_skb methods,
* hard&hot isnt, shit?
*/
blk_requestq_cachep = kmem_cache_create("blkdev_queue",
sizeof(struct request_queue), 0, SLAB_PANIC, NULL);
for_each_possible_cpu(i)
INIT_LIST_HEAD(&per_cpu(blk_cpu_done, i));
/* 3,
* per cpu list prepared for BLOCK_SOFTIRQ,
* powerful, and lockless of cough
*/
open_softirq(BLOCK_SOFTIRQ, blk_done_softirq, NULL);
register_hotcpu_notifier(&blk_cpu_notifier);
return 0;
}

复制代码

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

论坛徽章:: 0

118楼 [报告]

发表于 2008-07-27 14:23 |只看该作者

2, Q init
======
`blk layer` is connected to reiser/btr FS through left hand,
and through right hand connects scsi driver/controler.
general in all, `blk layer` represents the soft methods upon blk device,
including queue and scheduler, or simply elevator which is responsible for
managing whatever schedulers, if available.
in the eyes of elevator, blk device is a queue,
alloced by driver,
struct request_queue * blk_alloc_queue(gfp_t gfp_mask)
{
return blk_alloc_queue_node(gfp_mask, -1);
}
struct request_queue * blk_alloc_queue_node(gfp_t gfp_mask, int node_id)
{
struct request_queue *q;
int err;
q = kmem_cache_alloc_node(blk_requestq_cachep,
gfp_mask | __GFP_ZERO, node_id);
if (!q)
return NULL;
q->backing_dev_info.unplug_io_fn = blk_backing_dev_unplug;
/* 1,
* bdi, one concept in disk cache,
* goto see and compare with ramdisk, nice example for pseudo device,
* if u like to play game on SSD.
*/
q->backing_dev_info.unplug_io_data = q;
err = bdi_init(&q->backing_dev_info);
if (err) {
kmem_cache_free(blk_requestq_cachep, q);
return NULL;
}
/* 2,
* like hrtimer, simple machine for scheduling controler
*/
init_timer(&q->unplug_timer);
kobject_init(&q->kobj, &blk_queue_ktype);
/* 3,
* reguler sysfs socket, its papa looks cool
*/
mutex_init(&q->sysfs_lock);
spin_lock_init(&q->__queue_lock);
return q;
}
and initilised in blk layer,
struct request_queue * blk_init_queue(request_fn_proc *rfn, spinlock_t *lock)
{
return blk_init_queue_node(rfn, lock, -1);
}
struct request_queue *
blk_init_queue_node(request_fn_proc *rfn, spinlock_t *lock, int node_id)
{
struct request_queue *q = blk_alloc_queue_node(GFP_KERNEL, node_id);
if (!q)
return NULL;
q->node = node_id;
if (blk_init_free_list(q)) {
kmem_cache_free(blk_requestq_cachep, q);
return NULL;
}
/*
* if caller didn't supply a lock,
* they get per-queue locking with our embedded lock
*/
if (!lock)
lock = &q->__queue_lock;
q->queue_lock = lock;
q->request_fn = rfn;
q->prep_rq_fn = NULL;
q->unplug_fn = generic_unplug_device;
q->queue_flags = (1 << QUEUE_FLAG_CLUSTER);
blk_queue_segment_boundary(q, 0xffffffff);
blk_queue_make_request(q, __make_request);
blk_queue_max_segment_size(q, MAX_SEGMENT_SIZE);
blk_queue_max_hw_segments(q, MAX_HW_SEGMENTS);
blk_queue_max_phys_segments(q, MAX_PHYS_SEGMENTS);
q->sg_reserved_size = INT_MAX;
/*
* all done
*
* info big brother, wheelz, ready for game
*/
if (!elevator_init(q, NULL)) {
blk_queue_congestion_threshold(q);
return q;
}
blk_put_queue(q);
return NULL;
}

复制代码

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

论坛徽章:: 0

119楼 [报告]

发表于 2008-07-27 15:44 |只看该作者

3, basic methods for request alloc/free/prepare/init
=================================================
FS dispatch bios to elevator, just for efficiency they are merged into request
if possible, then delivered to controler when necessary.
static struct request *
get_request(struct request_queue *q, int rw_flags,
struct bio *bio, gfp_t gfp_mask)
{
struct request *rq = NULL;
struct request_list *rl = &q->rq;
struct io_context *ioc = NULL;
const int rw = rw_flags & 0x01;
int may_queue, priv;
may_queue = elv_may_queue(q, rw_flags);
/* 1,
* check resource not over limit, say cpu time.
* the starv concept still play its role,
* howto compensate?
*/
if (may_queue == ELV_MQUEUE_NO)
goto rq_starved;
if (rl->count[rw] +1 >= queue_congestion_on_threshold(q)) {
/* 2,
* not only for efficiency, nice scheduling also imposes
* resource limit upon all tasks in system,
* fair play is considered here, with little to do with
* if u r root, but compensation is also fair.
*/
if (rl->count[rw] +1 >= q->nr_requests) {
ioc = current_io_context(GFP_ATOMIC, q->node);
/*
* The queue will full after this allocation, so set
* it as full, and mark this process as "batching".
*
* This process will be allowed to complete a batch of
* requests, others will be blocked.
*/
if (!blk_queue_full(q, rw)) {
ioc_set_batching(q, ioc);
blk_set_queue_full(q, rw);
} else {
if (may_queue != ELV_MQUEUE_MUST
&& !ioc_batching(q, ioc)) {
/*
* The queue is full and the allocating
* process is not a "batcher", and not
* exempted by the IO scheduler
*/
goto out;
}
/* else
*
* batcher is biased to allocate upto 50%
* over the defined limit
*/
}
}
blk_set_queue_congested(q, rw);
}
/*
* Only allow batching queuers to allocate up to 50% over the defined
* limit of requests, otherwise we could have thousands of requests
* allocated with any setting of ->nr_requests
*/
if (rl->count[rw] >= (3 * q->nr_requests / 2))
goto out;
rl->count[rw]++;
rl->starved[rw] = 0;
priv = !test_bit(QUEUE_FLAG_ELVSWITCH, &q->queue_flags);
if (priv)
rl->elvpriv++;
spin_unlock_irq(q->queue_lock);
rq = blk_alloc_request(q, rw_flags, priv, gfp_mask);
if (unlikely(!rq)) {
/*
* Allocation failed presumably due to memory. Undo anything
* we might have messed up.
*
* Allocating task should really be put onto the front of the
* wait queue, but this is pretty rare.
*/
spin_lock_irq(q->queue_lock);
/* 3,
* nice scheduling is based upon housekeeper, right?
* is mm based on page frame?
*/
freed_request(q, rw, priv);
/*
* in the very unlikely event that allocation failed and no
* requests for this direction was pending, mark us starved
* so that freeing of a request in the other direction will
* notice us. another possible fix would be to split the
* rq mempool into READ and WRITE
*/
rq_starved:
if (unlikely(rl->count[rw] == 0))
rl->starved[rw] = 1;
goto out;
}
/*
* ioc may be NULL here, and ioc_batching will be false. That's
* OK, if the queue is under the request limit then requests need
* not count toward the nr_batch_requests limit. There will always
* be some limit enforced by BLK_BATCH_TIME.
*/
if (ioc_batching(q, ioc))
ioc->nr_batch_requests--;
blk_add_trace_generic(q, bio, rw, BLK_TA_GETRQ);
out:
return rq;
}
static void freed_request(struct request_queue *q, int rw, int priv)
{
struct request_list *rl = &q->rq;
rl->count[rw]--;
if (priv)
rl->elvpriv--;
__freed_request(q, rw);
if (unlikely(rl->starved[rw ^ 1]))
/* 1,
* today elevator has little to do with if u r reader,
* since nobody has enough power to declare blue blood is nicer,
*/
__freed_request(q, rw ^ 1);
}
static void __freed_request(struct request_queue *q, int rw)
{
struct request_list *rl = &q->rq;
if (rl->count[rw] < queue_congestion_off_threshold(q))
blk_clear_queue_congested(q, rw);
if (rl->count[rw] + 1 <= q->nr_requests) {
/* 2,
* but if and only if under control,
* chance for compensation still available to keep disk rotating
*/
if (waitqueue_active(&rl->wait[rw]))
wake_up(&rl->wait[rw]);
blk_clear_queue_full(q, rw);
}
}
void init_request_from_bio(struct request *req, struct bio *bio)
{
req->cmd_type = REQ_TYPE_FS;
/*
* inherit FAILFAST from bio (for read-ahead, and explicit FAILFAST)
*/
if (bio_rw_ahead(bio) || bio_failfast(bio))
req->cmd_flags |= REQ_FAILFAST;
/*
* REQ_BARRIER implies no merging, but lets make it explicit
*/
if (unlikely(bio_barrier(bio)))
req->cmd_flags |= (REQ_HARDBARRIER | REQ_NOMERGE);
if (bio_sync(bio))
req->cmd_flags |= REQ_RW_SYNC;
if (bio_rw_meta(bio))
req->cmd_flags |= REQ_RW_META;
req->errors = 0;
req->hard_sector = req->sector = bio->bi_sector;
/* 1,
* prio also considered, but how is it defined?
*/
req->ioprio = bio_prio(bio);
req->start_time = jiffies;
blk_rq_bio_prep(req->q, req, bio);
}
void blk_rq_bio_prep(struct request_queue *q, struct request *rq,
struct bio *bio)
{
/* first two bits are identical in rq->cmd_flags and bio->bi_rw */
rq->cmd_flags |= (bio->bi_rw & 3);
rq->nr_phys_segments = bio_phys_segments(q, bio);
rq->nr_hw_segments = bio_hw_segments(q, bio);
rq->current_nr_sectors = bio_cur_sectors(bio);
rq->hard_cur_sectors = rq->current_nr_sectors;
rq->hard_nr_sectors = rq->nr_sectors = bio_sectors(bio);
rq->buffer = bio_data(bio);
rq->data_len = bio->bi_size;
rq->bio = rq->biotail = bio;
if (bio->bi_bdev)
rq->rq_disk = bio->bi_bdev->bd_disk;
}

复制代码

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

论坛徽章:: 0

120楼 [报告]

发表于 2008-07-27 16:39 |只看该作者

4, basic method exported to FS
===========================
void submit_bio(int rw, struct bio *bio)
{
int count = bio_sectors(bio);
bio->bi_rw |= rw;
/*
* If it's a regular read/write or a barrier with data attached,
* go through the normal accounting stuff before submission.
*/
if (!bio_empty_barrier(bio)) {
BIO_BUG_ON(!bio->bi_size);
BIO_BUG_ON(!bio->bi_io_vec);
if (rw & WRITE) {
count_vm_events(PGPGOUT, count);
} else {
task_io_account_read(bio->bi_size);
count_vm_events(PGPGIN, count);
}
if (unlikely(block_dump)) {
char b[BDEVNAME_SIZE];
printk(KERN_DEBUG "%s(%d): %s block %Lu on %s\n",
current->comm, task_pid_nr(current),
(rw & WRITE) ? "WRITE" : "READ",
(unsigned long long)bio->bi_sector,
bdevname(bio->bi_bdev, b));
}
}
generic_make_request(bio);
}
void generic_make_request(struct bio *bio)
{
if (current->bio_tail) { /* make_request is active */
*(current->bio_tail) = bio;
bio->bi_next = NULL;
current->bio_tail = &bio->bi_next;
return;
}
/* following loop may be a bit non-obvious, and so deserves some
* explanation.
* Before entering the loop, bio->bi_next is NULL (as all callers
* ensure that) so we have a list with a single bio.
*
* We pretend that we have just taken it off a longer list, so
* we assign bio_list to the next (which is NULL) and bio_tail
* to &bio_list, thus initialising the bio_list of new bios to be
* added. __generic_make_request may indeed add some more bios
* through a recursive call to generic_make_request. If it
* did, we find a non-NULL value in bio_list and re-enter the loop
* from the top. In this case we really did just take the bio
* of the top of the list (no pretending) and so fixup bio_list and
* bio_tail or bi_next, and call into __generic_make_request again.
*
* The loop was structured like this to make only one call to
* __generic_make_request (which is important as it is large and
* inlined) and to keep the structure simple.
*/
BUG_ON(bio->bi_next);
do {
current->bio_list = bio->bi_next;
if (bio->bi_next == NULL)
current->bio_tail = &current->bio_list;
else
bio->bi_next = NULL;
__generic_make_request(bio);
bio = current->bio_list;
} while (bio);
current->bio_tail = NULL; /* deactivate */
}
static inline void __generic_make_request(struct bio *bio)
{
struct request_queue *q;
sector_t old_sector;
int ret, nr_sectors = bio_sectors(bio);
dev_t old_dev;
int err = -EIO;
might_sleep();
/* 1,
* check physical overflow
*/
if (bio_check_eod(bio, nr_sectors))
goto end_io;
/*
* Resolve the mapping until finished. (drivers are
* still free to implement/resolve their own stacking
* by explicitly returning 0)
*
* NOTE: we don't repeat the blk_size check for each new device.
* Stacking drivers are expected to know what they are doing.
*/
old_sector = -1;
old_dev = 0;
do {
char b[BDEVNAME_SIZE];
/* 2,
* determine physical disk
*/
q = bdev_get_queue(bio->bi_bdev);
if (!q) {
printk(KERN_ERR
"generic_make_request: Trying to access "
"nonexistent block-device %s (%Lu)\n",
bdevname(bio->bi_bdev, b),
(long long) bio->bi_sector);
end_io:
bio_endio(bio, err);
break;
}
/* 3,
* overflow controler's capability
*/
if (unlikely(nr_sectors > q->max_hw_sectors)) {
printk(KERN_ERR "bio too big device %s (%u > %u)\n",
bdevname(bio->bi_bdev, b),
bio_sectors(bio),
q->max_hw_sectors);
goto end_io;
}
/* 4,
* elevator still healthy
*/
if (unlikely(test_bit(QUEUE_FLAG_DEAD, &q->queue_flags)))
goto end_io;
/* 5,
* if just a poke
*/
if (should_fail_request(bio))
goto end_io;
/* 6,
* If this device has partitions, remap block n
* of partition p to block n+start(p) of the disk.
*/
blk_partition_remap(bio);
if (old_sector != -1)
blk_add_trace_remap(q, bio, old_dev, bio->bi_sector,
old_sector);
blk_add_trace_bio(q, bio, BLK_TA_QUEUE);
old_sector = bio->bi_sector;
old_dev = bio->bi_bdev->bd_dev;
if (bio_check_eod(bio, nr_sectors))
goto end_io;
/* 7,
* If hard to pose
*/
if (bio_empty_barrier(bio) && !q->prepare_flush_fn) {
err = -EOPNOTSUPP;
goto end_io;
}
/* 8,
* do it, but where assigned?
* void blk_queue_make_request(struct request_queue *q,
* make_request_fn *mfn)
* and how about __make_request?
* it is dft method when Q init
*/
ret = q->make_request_fn(q, bio);
} while (ret);
}
static int __make_request(struct request_queue *q, struct bio *bio)
{
struct request *req;
int el_ret, nr_sectors, barrier, err;
const unsigned short prio = bio_prio(bio);
const int sync = bio_sync(bio);
int rw_flags;
nr_sectors = bio_sectors(bio);
/*
* low level driver can indicate that it wants pages above a
* certain limit bounced to low memory (ie for highmem, or even
* ISA dma in theory)
*/
blk_queue_bounce(q, &bio);
barrier = bio_barrier(bio);
if (unlikely(barrier) && (q->next_ordered == QUEUE_ORDERED_NONE)) {
err = -EOPNOTSUPP;
goto end_io;
}
spin_lock_irq(q->queue_lock);
/* 1,
* bar do no merge in principle
*/
if (unlikely(barrier) || elv_queue_empty(q))
goto get_rq;
/* 2,
* try merge
*/
el_ret = elv_merge(q, &req, bio);
switch (el_ret) {
case ELEVATOR_BACK_MERGE:
BUG_ON(!rq_mergeable(req));
if (!ll_back_merge_fn(q, req, bio))
break;
blk_add_trace_bio(q, bio, BLK_TA_BACKMERGE);
req->biotail->bi_next = bio;
req->biotail = bio;
req->nr_sectors = req->hard_nr_sectors += nr_sectors;
req->ioprio = ioprio_best(req->ioprio, prio);
drive_stat_acct(req, 0);
if (!attempt_back_merge(q, req))
elv_merged_request(q, req, el_ret);
goto out;
case ELEVATOR_FRONT_MERGE:
BUG_ON(!rq_mergeable(req));
if (!ll_front_merge_fn(q, req, bio))
break;
blk_add_trace_bio(q, bio, BLK_TA_FRONTMERGE);
bio->bi_next = req->bio;
req->bio = bio;
/*
* may not be valid. if the low level driver said
* it didn't need a bounce buffer then it better
* not touch req->buffer either...
*/
req->buffer = bio_data(bio);
req->current_nr_sectors = bio_cur_sectors(bio);
req->hard_cur_sectors = req->current_nr_sectors;
req->sector = req->hard_sector = bio->bi_sector;
req->nr_sectors = req->hard_nr_sectors += nr_sectors;
req->ioprio = ioprio_best(req->ioprio, prio);
drive_stat_acct(req, 0);
if (!attempt_front_merge(q, req))
elv_merged_request(q, req, el_ret);
goto out;
/* ELV_NO_MERGE: elevator says don't/can't merge. */
default:
break;
}
get_rq:
/*
* This sync check and mask will be re-done in init_request_from_bio(),
* but we need to set it earlier to expose the sync flag to the
* rq allocator and io schedulers.
*/
rw_flags = bio_data_dir(bio);
if (sync)
rw_flags |= REQ_RW_SYNC;
/* 3,
* Grab a free request. This is might sleep but can not fail.
* Returns with the queue unlocked.
*/
req = get_request_wait(q, rw_flags, bio);
/*
* After dropping the lock and possibly sleeping here, our request
* may now be mergeable after it had proven unmergeable (above).
*
* We don't worry about that case for efficiency. It won't happen
* often, and the elevators are able to handle it.
*/
init_request_from_bio(req, bio);
spin_lock_irq(q->queue_lock);
if (elv_queue_empty(q))
blk_plug_device(q);
/* 4,
* in case of new request, and then wait on queue
*/
add_request(q, req);
out:
if (sync) {
/* 5,
* if neccessary, drive disk rotating,
* even current gas is above $100
*/
__generic_unplug_device(q);
}
spin_unlock_irq(q->queue_lock);
return 0;
end_io:
bio_endio(bio, err);
return 0;
}

复制代码

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

1 ... 7 8 9 10 111213 14 15 / 15 页下一页

发新帖

Chinaunix › 论坛 › 程序设计 › 内核源码 › Kernel Bug-Vulnerability-Comment library

北京盛拓优讯信息技术有限公司. 版权所有京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号：11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员联系我们：huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP