论坛徽章:: 0

电梯直达

1楼 [收藏(0)] [报告]

发表于 2012-10-23 11:42 |只看该作者 |倒序浏览

本帖最后由 blake326 于 2012-10-25 20:38 编辑

kernel：3.6

硬件：
一般soc会有多个sp804外部timer，假设现在timer0作全部时钟设备，timer1作为clocksource。
arm smp local timer。

核心数据结构对象：
1. struct clock_event_device 时钟设备抽象类型，其中set_next_event可以设置下次中断时间，
2. event_handler是中断处理回调一般是tick_handle_periodic或hrtimer_interrupt核心方法。
sp804 timer0， smp local timer都会抽象成clock_event_device并且注册到系统中。
3. struct clocksource 时钟源抽象类型，通过read方法可以读取当前时间。系统中sp804 timer1，
纯软件的jiffies抽象成clocksource注册到系统中。
4. struct timekeeper timekeeper全局变量，管理clocksource，配合保存的xtime值提供读取
当前时间功能。
5. struct timespec xtime全局变量，保存了当前时间，在tick中断时更新。

整体流程：
1.start_kernel init_timers()/hrtimers_init() 初始化wheel timer和hrtimer。
2.start_kernel->timekeeping_init() 初始化timekeeper结构，并且clock=jiffies clocksource。
3.start_kernel->time_init() 板级代码注册sp804 timer0 clock_event_device。
初始化sp804 timer，注册中断设置每cpu的tick_cpu_device->evtdev为sp804 timer。之后中断就生效了。
4.start_kernel->time_init() 板级代码注册sp804 timer1 clocksource。
5.start_kernel->reset_init()...kernel_init()->smp_prepare_cpus() 注册cpu0 local timer clock_event_device。
其中会close 之前的sp804timer。之后localtimer中断就生效了（timer0中断不会有了）。
设置每cpu的tick_cpu_device->evtdev为local timer。
6.secondary_start_kernel()->percpu_timer_setup() 注册cpuX local timer clock_event_device。同cpu0。
7.do_init_calls()->init_jiffies_clocksource()注册jiffies clocksource。
触发timekeeper的clock设置为优先级更高的sp804 timer1 clocksource。
从而后面timer软中断就会从periodic模式切换为oneshot模式。
8.cpu0,cpuX的TIMER_SOFTIRQ软中断，会将每cpu的tick_cpu_device->evtdev模式设置为oneshot模式，
event_handler方法也变成了hrtimer_interrupt。

*********************************************************************************************************
具体流程timer0 clock_event_device注册：

start_kernel->time_init()
machine_desc->timer->init();
板级相关代码。一般都会调用sp804_clockevents_init初始化sp804 timer。以及具体的对timer的设置。
static struct clock_event_device sp804_clockevent = {
.features    = CLOCK_EVT_FEAT_PERIODIC | CLOCK_EVT_FEAT_ONESHOT,
.set_mode = sp804_set_mode,
.set_next_event = sp804_set_next_event,
.rating = 300,
.cpumask = cpu_all_mask,
};
static struct irqaction sp804_timer_irq = {
.name = "timer",
.flags = IRQF_DISABLED | IRQF_TIMER | IRQF_IRQPOLL,
.handler = sp804_timer_interrupt,
.dev_id = &sp804_clockevent,
};
void __init sp804_clockevents_init(void __iomem *base, unsigned int irq, const char *name)
{
struct clock_event_device *evt = &sp804_clockevent;
long rate = sp804_get_clock_rate(name);

if (rate < 0)
return;

clkevt_base = base;
clkevt_reload = DIV_ROUND_CLOSEST(rate, HZ);
evt->name = name;
evt->irq = irq;

setup_irq(irq, &sp804_timer_irq);
clockevents_config_and_register(evt, rate, 0xf, 0xffffffff);
}

在kernel提供好的sp804_clockevents_init方法中，实际上主要是注册了一个sp804_clockevent。
参数base是板级相关的timer基地址，irq是中断号。
然后关键的是通过setup_irq设置中断处理函数sp804_timer_irq，
以及clockevents_config_and_register注册clock_event_device。
sp804_timer_irq主要是调用了关联的sp804_clockevent的event_handler方法。
当然这个event_handler方法是在clockevents_register_device过程中初始化的。
static irqreturn_t sp804_timer_interrupt(int irq, void *dev_id)
{
struct clock_event_device *evt = dev_id;
/* clear the interrupt */
writel(1, clkevt_base + TIMER_INTCLR);
//这个是在clockevents_config_and_register中初始化的方法。
evt->event_handler(evt);
return IRQ_HANDLED;
}

void clockevents_config_and_register(struct clock_event_device *dev,
      u32 freq, unsigned long min_delta,
      unsigned long max_delta)
{
dev->min_delta_ticks = min_delta;
dev->max_delta_ticks = max_delta;
clockevents_config(dev, freq);
clockevents_register_device(dev);
}
void clockevents_register_device(struct clock_event_device *dev)
{
unsigned long flags;

BUG_ON(dev->mode != CLOCK_EVT_MODE_UNUSED);
if (!dev->cpumask) {
WARN_ON(num_possible_cpus() > 1);
dev->cpumask = cpumask_of(smp_processor_id());
}

raw_spin_lock_irqsave(&clockevents_lock, flags);

list_add(&dev->list, &clockevent_devices);
clockevents_do_notify(CLOCK_EVT_NOTIFY_ADD, dev);
clockevents_notify_released();

raw_spin_unlock_irqrestore(&clockevents_lock, flags);
}

通过clockevents_do_notify(CLOCK_EVT_NOTIFY_ADD, dev)会通知调用tick_notifier来处理。
static struct notifier_block tick_notifier = {
.notifier_call = tick_notify,
};
static int tick_notify(struct notifier_block *nb, unsigned long reason,
         void *dev)
{
switch (reason) {

case CLOCK_EVT_NOTIFY_ADD:
return tick_check_new_device(dev);
。。。
}

新的clock_event_device加入的时候通过这个方法来初始化。
当然其实这是我们系统中的第一个clock_event_device。这个时候cpu1还处于wfi状态。
此时这个newdev不是cpu local device。每cpu的tick_cpu_device->evtdev的clock_event_device也指向的null。
关键的通过tick_setup_device设置这个newdev。
static int tick_check_new_device(struct clock_event_device *newdev)
{
struct clock_event_device *curdev;
struct tick_device *td;
int cpu, ret = NOTIFY_OK;
unsigned long flags;

raw_spin_lock_irqsave(&tick_device_lock, flags);

cpu = smp_processor_id();
if (!cpumask_test_cpu(cpu, newdev->cpumask))
goto out_bc;

td = &per_cpu(tick_cpu_device, cpu);
curdev = td->evtdev;

/* cpu local device ? */
if (!cpumask_equal(newdev->cpumask, cpumask_of(cpu))) {

/*
   * If the cpu affinity of the device interrupt can not
   * be set, ignore it.
   */
if (!irq_can_set_affinity(newdev->irq))
goto out_bc;

/*
   * If we have a cpu local device already, do not replace it
   * by a non cpu local device
   */
if (curdev && cpumask_equal(curdev->cpumask, cpumask_of(cpu)))
goto out_bc;
}

/*
   * If we have an active device, then check the rating and the oneshot
   * feature.
   */
if (curdev) {
/*
   * Prefer one shot capable devices !
   */
if ((curdev->features & CLOCK_EVT_FEAT_ONESHOT) &&
      !(newdev->features & CLOCK_EVT_FEAT_ONESHOT))
goto out_bc;
/*
   * Check the rating
   */
if (curdev->rating >= newdev->rating)
goto out_bc;
}

/*
   * Replace the eventually existing device by the new
   * device. If the current device is the broadcast device, do
   * not give it back to the clockevents layer !
   */
if (tick_is_broadcast_device(curdev)) {
clockevents_shutdown(curdev);
curdev = NULL;
}
clockevents_exchange_device(curdev, newdev);
tick_setup_device(td, newdev, cpu, cpumask_of(cpu));
if (newdev->features & CLOCK_EVT_FEAT_ONESHOT)
tick_oneshot_notify();

raw_spin_unlock_irqrestore(&tick_device_lock, flags);
return NOTIFY_STOP;

out_bc:
/*
   * Can the new device be used as a broadcast device ?
   */
if (tick_check_broadcast_device(newdev))
ret = NOTIFY_STOP;

raw_spin_unlock_irqrestore(&tick_device_lock, flags);

return ret;
}

tick_setup_device中会将我们的sp804 timer作为每cpu的tick_cpu_device->evtdev.
由于原来的每cpu的tick_cpu_device->evtdev是空的，所以会初始化tick周期，设置下次tick时间，
并且设置当前cpu（cpu0）作为do_timer处理时间工作的cpu。
之后会通过tick_setup_periodic来设置clock_event_device的event_handler方法，
中断处理函数实际上就是调用的这个event_handler方法，到这里才设置的。
static void tick_setup_device(struct tick_device *td,
      struct clock_event_device *newdev, int cpu,
      const struct cpumask *cpumask)
{
ktime_t next_event;
void (*handler)(struct clock_event_device *) = NULL;

/*
   * First device setup ?
   */
if (!td->evtdev) {
/*
   * If no cpu took the do_timer update, assign it to
   * this cpu:
   */
if (tick_do_timer_cpu == TICK_DO_TIMER_BOOT) {
tick_do_timer_cpu = cpu;
tick_next_period = ktime_get();
tick_period = ktime_set(0, NSEC_PER_SEC / HZ);
}

/*
   * Startup in periodic mode first.
   */
td->mode = TICKDEV_MODE_PERIODIC;
} else {
handler = td->evtdev->event_handler;
next_event = td->evtdev->next_event;
td->evtdev->event_handler = clockevents_handle_noop;
}

td->evtdev = newdev;

/*
   * When the device is not per cpu, pin the interrupt to the
   * current cpu:
   */
if (!cpumask_equal(newdev->cpumask, cpumask))
irq_set_affinity(newdev->irq, cpumask);

/*
   * When global broadcasting is active, check if the current
   * device is registered as a placeholder for broadcast mode.
   * This allows us to handle this x86 misfeature in a generic
   * way.
   */
if (tick_device_uses_broadcast(newdev, cpu))
return;

if (td->mode == TICKDEV_MODE_PERIODIC)
tick_setup_periodic(newdev, 0);
else
tick_setup_oneshot(newdev, handler, next_event);
}

tick_set_periodic_handler将event_handler设置为tick_handle_periodic。
然后设置clock_event_device模式为CLOCK_EVT_MODE_PERIODIC)。
void tick_setup_periodic(struct clock_event_device *dev, int broadcast)
{
tick_set_periodic_handler(dev, broadcast);

/* Broadcast setup ? */
if (!tick_device_is_functional(dev))
return;

if ((dev->features & CLOCK_EVT_FEAT_PERIODIC) &&
      !tick_broadcast_oneshot_active()) {
clockevents_set_mode(dev, CLOCK_EVT_MODE_PERIODIC);
} else {
unsigned long seq;
ktime_t next;

do {
seq = read_seqbegin(&xtime_lock);
next = tick_next_period;
} while (read_seqretry(&xtime_lock, seq));

clockevents_set_mode(dev, CLOCK_EVT_MODE_ONESHOT);

for (;

{
if (!clockevents_program_event(dev, next, false))
return;
next = ktime_add(next, tick_period);
}
}
}
void tick_set_periodic_handler(struct clock_event_device *dev, int broadcast)
{
if (!broadcast)
dev->event_handler = tick_handle_periodic;
else
dev->event_handler = tick_handle_periodic_broadcast;
}

event_handler periodic中断处理函数。kernel的一个core方法。
tick_periodic是具体的处理。
void tick_handle_periodic(struct clock_event_device *dev)
{
int cpu = smp_processor_id();
ktime_t next;

tick_periodic(cpu);

if (dev->mode != CLOCK_EVT_MODE_ONESHOT)
return;
/*
   * Setup the next period for devices, which do not have
   * periodic mode:
   */
next = ktime_add(dev->next_event, tick_period);
for (;

{
if (!clockevents_program_event(dev, next, false))
return;
/*
   * Have to be careful here. If we're in oneshot mode,
   * before we call tick_periodic() in a loop, we need
   * to be sure we're using a real hardware clocksource.
   * Otherwise we could get trapped in an infinite
   * loop, as the tick_periodic() increments jiffies,
   * when then will increment time, posibly causing
   * the loop to trigger again and again.
   */
if (timekeeping_valid_for_hres())
tick_periodic(cpu);
next = ktime_add(next, tick_period);
}
}

如果是cpu0的话，调用do_timer做更新时间等操作。
不管哪个cpu都要调用update_process_times更新cpu使用信息，以及处理timer wheel软中断。
static void tick_periodic(int cpu)
{
if (tick_do_timer_cpu == cpu) {
write_seqlock(&xtime_lock);

/* Keep track of the next tick event */
tick_next_period = ktime_add(tick_next_period, tick_period);

do_timer(1);
write_sequnlock(&xtime_lock);
}

update_process_times(user_mode(get_irq_regs()));
profile_tick(CPU_PROFILING);
}

void do_timer(unsigned long ticks)
{
jiffies_64 += ticks;
update_wall_time();
calc_global_load(ticks);
}

account_process_tick更新cpu统计信息。
run_local_timers触发TIMER_SOFTIRQ软中断。
scheduler_tick进行进程调度（在hrtimer激活的情况下，这个操作基本是空的）。
void update_process_times(int user_tick)
{
struct task_struct *p = current;
int cpu = smp_processor_id();

/* Note: this timer irq context must be accounted for as well. */
account_process_tick(p, user_tick);
run_local_timers();
rcu_check_callbacks(cpu, user_tick);
printk_tick();
#ifdef CONFIG_IRQ_WORK
if (in_irq())
irq_work_run();
#endif
scheduler_tick();
run_posix_cpu_timers(p);
}

到这里为止，cpu0的timer中断已经是准备好并且开始工作了。jiffies_64也在不断的增加了。
大概只需要10来个jiffies（HZ=100）之后，arm的local timer也参与到kernel中来了。

文库|博客

blake326

丰衣足食

论坛徽章:: 0

2楼 [报告]

发表于 2012-10-23 13:04 |只看该作者

本帖最后由 blake326 于 2012-10-25 20:38 编辑

*******************************************************************************************
cpu0 local timer clock_event_device注册过程：
start_kernel()->reset_init()->kernel_init()->smp_prepare_cpus()
->percpu_timer_setup()->twd_timer_setup()->clockevents_config_and_register()
->twd_local_timer_common_register()->local_timer_register()
这里也是板级相关代码，不过内核提供了一些core方法。当然只有smp才会执行到这里的路径。
这个时候cpu1仍然处于wfi状态。不过随后cpu1被唤醒之后同样也会执行类似方法设置相应的local timer。
twd_timer是arm每个核内部的timer。

local timer的中断处理函数基本上就是调用了关联的clock_event_device->event_handler()方法。
clock_event_device对象是每cpu分配的对象。同上面一样，event_handler方法也是后面才初始化的。
static irqreturn_t twd_handler(int irq, void *dev_id)
{
struct clock_event_device *evt = *(struct clock_event_device **)dev_id;

if (twd_timer_ack()) {
evt->event_handler(evt);
return IRQ_HANDLED;
}

return IRQ_NONE;
}

可以看出来local timer同sp804 clock_event_device流程非常相似，中断处理函数都是调用了自己的
clock_event_device的event_handler方法。不过这两个clock_event_device代表了不同的硬件，他们的
set_next_event触发中断方法不同。rating优先级不一样，一般localtimer=350，sp804=300，就是说localtimer的
优先级要高。irq中断号当然也不一样。。

通过tick_check_new_device()新的clock_event_device注册到系统中的时候，rating有很大的作用。
当增加一个新的newdevice，如果系统当前没有device，则设置当前device等于newdevice。
当增加一个新的newdevice，如果newdevice的rating小于当前device的rating，则会忽略这个newdevice。
反之newdevice通过clockevents_exchange_device关闭当前的device，然后切换当前device等于newdevice。
接着通过tick_setup_device继续设置event_handler=tick_handle_periodic。
所以，这里local timer会替代之前注册的sp804 timer（只有cpu0才有）。从这里开始，sp804 timer0实际上等同
disable的,它的next_event中断时间是一个超大的值。而local timer中断不断的产生并且处理了。

*******************************************************************************************
接着看看tick_handle_periodic如何切换成hrtimer_interrupt的。

首先在init_jiffies_clocksource中注册了jiffies clocksource到kernel。
static int __init init_jiffies_clocksource(void)
{
return clocksource_register(&clocksource_jiffies);
}
core_initcall(init_jiffies_clocksource);

clocksource_enqueue()实际上就是按照优先级rating将所有clocksource链接到clocksource_list上。
并且链表头部的clocksource是rating最高的。
clocksource_select()会选择best clocksource，等于当前最高优先级的clocksource，
或者是用户指定的clocksource（如果指定了的话）。这时，如果best clocksource不等于当前clocksource，
调用timekeeping_notify通知内核新的clocksource生效了。
int clocksource_register(struct clocksource *cs)
{
/* calculate max adjustment for given mult/shift */
cs->maxadj = clocksource_max_adjustment(cs);
WARN_ONCE(cs->mult + cs->maxadj < cs->mult,
"Clocksource %s might overflow on 11%% adjustment\n",
cs->name);

/* calculate max idle time permitted for this clocksource */
cs->max_idle_ns = clocksource_max_deferment(cs);

mutex_lock(&clocksource_mutex);
clocksource_enqueue(cs);
clocksource_enqueue_watchdog(cs);
clocksource_select();
mutex_unlock(&clocksource_mutex);
return 0;
}

timekeeping_notify实际上会调用到change_clocksource方法来更新timekeeper的clocksource为best clocksource。
tick_clock_notify()会触发kernel去检测是否要切换到hrtimer模式。
void timekeeping_notify(struct clocksource *clock)
{
struct timekeeper *tk = &timekeeper;

if (tk->clock == clock)
return;
stop_machine(change_clocksource, clock, NULL);
tick_clock_notify();
}

static int change_clocksource(void *data)
{
struct timekeeper *tk = &timekeeper;
struct clocksource *new, *old;
unsigned long flags;

new = (struct clocksource *) data;

write_seqlock_irqsave(&tk->lock, flags);

timekeeping_forward_now(tk);
if (!new->enable || new->enable(new) == 0) {
old = tk->clock;
tk_setup_internals(tk, new);
if (old->disable)
old->disable(old);
}
timekeeping_update(tk, true);

write_sequnlock_irqrestore(&tk->lock, flags);

return 0;
}

其实这个时候，sp804 timer1作为free run timer的clocksource早已经注册到系统中了。
所以现在注册jiffies clocksource的时候，就会触发timekeeping_notify来更新timekeeper的clocksource，
同时通知kernel去检查是否要切换hrtimer模式。

当前的时钟中断处理函数回调是tick_handle_periodic()
tick_periodic()->update_process_times->run_local_timers()激活TIMER_SOFTIRQ处理timer软中断。
hrtimer_run_queues如果kernel没有启动hrtimer模式的话，就在这里处理hrtimer事件。所以hrtimer api
仍然可以使用，但是是通过tick来触发的。
void run_local_timers(void)
{
hrtimer_run_queues();
raise_softirq(TIMER_SOFTIRQ);
}

timer软中断其实还在hrtimer_run_pending检测是否要切换到hrtimer模式。
open_softirq(TIMER_SOFTIRQ, run_timer_softirq);
static void run_timer_softirq(struct softirq_action *h)
{
struct tvec_base *base = __this_cpu_read(tvec_bases);

hrtimer_run_pending();

if (time_after_eq(jiffies, base->timer_jiffies))
__run_timers(base);
}
void hrtimer_run_pending(void)
{
if (hrtimer_hres_active())
return;

/*
   * This _is_ ugly: We have to check in the softirq context,
   * whether we can switch to highres and / or nohz mode. The
   * clocksource switch happens in the timer interrupt with
   * xtime_lock held. Notification from there only sets the
   * check bit in the tick_oneshot code, otherwise we might
   * deadlock vs. xtime_lock.
   */
if (tick_check_oneshot_change(!hrtimer_is_hres_enabled()))
hrtimer_switch_to_hres();
}

int tick_check_oneshot_change(int allow_nohz)
{
struct tick_sched *ts = &__get_cpu_var(tick_cpu_sched);
      //tick_clock_notify会设置check_clocks位。所以这里就会要继续检测了。
      //然后又清0，所以系统绝大部分事件这个方法下面都不会走的。
if (!test_and_clear_bit(0, &ts->check_clocks))
return 0;

if (ts->nohz_mode != NOHZ_MODE_INACTIVE)
return 0;

if (!timekeeping_valid_for_hres() || !tick_is_oneshot_available())
return 0;

if (!allow_nohz)
return 1;

tick_nohz_switch_to_nohz();
return 0;
}

检查timekeeper的clocksource时候有hrtimer能力。当然我们的sp804 timer1是符合要求的。
int timekeeping_valid_for_hres(void)
{
struct timekeeper *tk = &timekeeper;
unsigned long seq;
int ret;

do {
seq = read_seqbegin(&tk->lock);

ret = tk->clock->flags & CLOCK_SOURCE_VALID_FOR_HRES;

} while (read_seqretry(&tk->lock, seq));

return ret;
}

然后通过hrtimer_switch_to_hres切换perodioc到oneshot hrtimer模式。
tick_setup_sched_timer()是通过hrtimer模拟的tick中断，当然系统在idle的时候，
这个hrtimer中断会关闭的。
static int hrtimer_switch_to_hres(void)
{
int i, cpu = smp_processor_id();
struct hrtimer_cpu_base *base = &per_cpu(hrtimer_bases, cpu);
unsigned long flags;

if (base->hres_active)
return 1;

local_irq_save(flags);

if (tick_init_highres()) {
local_irq_restore(flags);
printk(KERN_WARNING "Could not switch to high resolution "
      "mode on CPU %d\n", cpu);
return 0;
}
base->hres_active = 1;
for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++)
base->clock_base.resolution = KTIME_HIGH_RES;

tick_setup_sched_timer();
/* "Retrigger" the interrupt to get things going */
retrigger_next_event(NULL);
local_irq_restore(flags);
return 1;
}

切换每cpu的tick_cpu_device->clock_event_device的event_handler方法为hrtimer_interrupt。
当然在tick_check_new_device的时候每cpu的tick_cpu_device->clock_event_device已经指向了localtimer device。
int tick_init_highres(void)
{
return tick_switch_to_oneshot(hrtimer_interrupt);
}

todo：
cpu1 唤醒流程。

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

blake326

丰衣足食

论坛徽章:: 0

3楼 [报告]

发表于 2012-10-23 13:05 |只看该作者

本帖最后由 blake326 于 2012-10-26 09:17 编辑

*******************************************************************************************
cpu0在start_kernel直到在kernel_init()->smp_init()之前，cpu1都是在wfi状态的（一般是bootloader代码），
等待cpu0去唤醒它继续执行。
start_kernel->rest_init
  kernel_init()->smp_init()->cpu_up(cpu)->_cpu_up()->__cpu_up()->boot_secondary()

_cpu_up首先会初始化每cpu的idle 0号进程。
boot_secondary()一般就是调用gic_raise_softirq唤醒cpu1。
int __cpuinit boot_secondary(unsigned int cpu, struct task_struct *idle)
{
gic_raise_softirq(cpumask_of(cpu), 0);
return 0;
}

cpu1 kernel的c语言入口是secondary_start_kernel()
汇编入口是secondary_startup
板级相关代码就是要使cpu1跳转到secondary_startup物理地址执行。
ENTRY(secondary_startup)
/*
   * Common entry point for secondary CPUs.
   *
   * Ensure that we're in SVC mode, and IRQs are disabled.  Lookup
   * the processor type - there is no need to check the machine type
   * as it has already been validated by the primary processor.
   */
setmode PSR_F_BIT | PSR_I_BIT | SVC_MODE, r9
mrc p15, 0, r9, c0, c0 @ get processor id
bl __lookup_processor_type
movs r10, r5 @ invalid processor?
moveq r0, #'p' @ yes, error 'p'
THUMB( it eq ) @ force fixup-able long branch encoding
beq __error_p

/*
   * Use the page tables supplied from  __cpu_up.
   */
adr r4, __secondary_data
ldmia r4, {r5, r7, r12} @ address to jump to after
sub lr, r4, r5 @ mmu has been enabled
ldr r4, [r7, lr] @ get secondary_data.pgdir
add r7, r7, #4
ldr r8, [r7, lr] @ get secondary_data.swapper_pg_dir
adr lr, BSYM(__enable_mmu) @ return address
mov r13, r12 @ __secondary_switched address
ARM( add pc, r10, #PROCINFO_INITFUNC ) @ initialise processor
   @ (return control reg)
THUMB( add r12, r10, #PROCINFO_INITFUNC )
THUMB( mov pc, r12 )
ENDPROC(secondary_startup)

ENTRY(__secondary_switched)
ldr sp, [r7, #4] @ get secondary_data.stack
mov fp, #0
b secondary_start_kernel
ENDPROC(__secondary_switched)

asmlinkage void __cpuinit secondary_start_kernel(void)
{
struct mm_struct *mm = &init_mm;
unsigned int cpu = smp_processor_id();

/*
   * All kernel threads share the same mm context; grab a
   * reference and switch to it.
   */
atomic_inc(&mm->mm_count);
current->active_mm = mm;
cpumask_set_cpu(cpu, mm_cpumask(mm));
cpu_switch_mm(mm->pgd, mm);
enter_lazy_tlb(mm, current);
local_flush_tlb_all();

printk("CPU%u: Booted secondary processor\n", cpu);

cpu_init();
preempt_disable();
trace_hardirqs_off();

/*
   * Give the platform a chance to do its own initialisation.
   */
platform_secondary_init(cpu);

notify_cpu_starting(cpu);

calibrate_delay();

smp_store_cpu_info(cpu);

/*
   * OK, now it's safe to let the boot CPU continue.  Wait for
   * the CPU migration code to notice that the CPU is online
   * before we continue - which happens after __cpu_up returns.
   */
set_cpu_online(cpu, true);
complete(&cpu_running);

/*
   * Setup the percpu timer for this CPU.
   */
percpu_timer_setup();

local_irq_enable();
local_fiq_enable();

/*
   * OK, it's off to the idle thread for us
   */
cpu_idle();
}

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

embeddedlwp

版主

论坛徽章:: 16

4楼 [报告]

发表于 2012-10-23 13:29 |只看该作者

回复 3# blake326

有兴趣看看ondemand readahead吗？

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

blake326

丰衣足食

论坛徽章:: 0

5楼 [报告]

发表于 2012-10-23 13:35 |只看该作者

回复 4# embeddedlwp

前段时间苗了下，有问题可以交流交流。

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

embeddedlwp

版主

论坛徽章:: 16

6楼 [报告]

发表于 2012-10-23 14:19 |只看该作者

回复 5# blake326

linux 3.7-rc2

在__do_page_cache_readahead函数中，为什么要:

if (page_idx == nr_to_read - lookahead_size)
SetPageReadahead(page);

为什么这里要这样设置PG_readahead呢？既然已经被readahead了，那么还设置这个标志干嘛？

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

blake326

丰衣足食

论坛徽章:: 0

7楼 [报告]

发表于 2012-10-23 17:33 |只看该作者

回复 6# embeddedlwp

稍微看了下，我觉得这个PG_readahead标志主要是给filemap使用的。如果没有这个标志的话，考虑下面场景：

用户进程mmap了一个文件，读取一个page，进行一些列工作，然后读取下个page。。。。


一开始访问page 1的时候，缺页异常，并且page不在cache，通过do_sync_mmap_readahead发送一组读page（假设4个）请求，并且这个时候这一组的page都会分配到cache中。然后读第2,3，4个page，都会产生缺页异常，这个时候cache中一般有这些page了，并且是uptodate的，但是没有相应的页表映射，所以会产生缺页异常。

然后读第5个page，缺页异常。不过这个时候又需要发出do_sync_mmap_readahead请求同步读取到page。

假设有了PG_readahead之后，可能在读第2,3个page的时候，缺页异常，page在cache中，page是uptodate的，但是现该page是PG_readahead的，从而触发一次
do_async_mmap_readahead也就是发送一组page读取请求。这样读第5个page的时候，缺页异常，但是page实际上一般已经在cache了并且是uptodate了。

用户进程的性能提高了。

int filemap_fault(struct vm_area_struct *vma, struct vm_fault *vmf)
{

page = find_get_page(mapping, offset);
if (likely(page)) {
/*
   * We found the page, so try async readahead before
   * waiting for the lock.
   */
do_async_mmap_readahead(vma, ra, file, page, offset);
} else {
/* No page in the page cache at all */
do_sync_mmap_readahead(vma, ra, file, offset);
count_vm_event(PGMAJFAULT);
mem_cgroup_count_vm_event(vma->vm_mm, PGMAJFAULT);
ret = VM_FAULT_MAJOR;
retry_find:
page = find_get_page(mapping, offset);
if (!page)
goto no_cached_page;
}

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

embeddedlwp

版主

论坛徽章:: 16

8楼 [报告]

发表于 2012-10-23 17:44 |只看该作者

本帖最后由 embeddedlwp 于 2012-10-23 17:49 编辑

回复 7# blake326

oh, I see，thx，关于ondemand readahead你还有什么心得，写出来，咱们讨论讨论

实战分享：从技术角度谈机器学习入门| 【大话IT】RadonDB低门槛向MySQL集群下战书 | ChinaUnix打赏功能已上线！ | 新一代分布式关系型数据库RadonDB知多少？

返回列表

Chinaunix › 论坛 › 程序设计 › 内核源码 › arm 时间系统

[时钟管理] arm 时间系统 [复制链接]