免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 4444 | 回复: 4
打印 上一主题 下一主题

[内核入门] armv7 架构 [复制链接]

论坛徽章:
0
跳转到指定楼层
1 [收藏(0)] [报告]
发表于 2013-01-14 22:12 |只看该作者 |倒序浏览
本帖最后由 blake326 于 2013-02-18 20:51 编辑

DDI0406B_arm_architecture_reference_manual_errata_markup_8_



ARMv7 provides three profiles:
ARMv7-A Application profile, described in this manual. Implements a traditional ARM architecture
with multiple modes and supporting a Virtual Memory System Architecture (VMSA) based
on an MMU. Supports the ARM and Thumb instruction sets.
ARMv7-R Real-time profile, described in this manual. Implements a traditional ARM architecture with
multiple modes and supporting a Protected Memory System Architecture (PMSA) based on
an MPU. Supports the ARM and Thumb instruction sets.
ARMv7-M Microcontroller profile, described in the ARMv7-M Architecture Reference Manual.
Implements a programmers' model designed for fast interrupt processing, with hardware
stacking of registers and support for writing interrupt handlers in high-level languages.
Implements a variant of the ARMv7 PMSA and supports a variant of the Thumb instruction
set.

Coprocessor support:
Coprocessor space is used to extend the functionality of an ARM processor. There are sixteen coprocessors
defined in the coprocessor instruction space. These are commonly known as CP0 to CP15. The following
coprocessors are reserved by ARM for specific purposes:
? Coprocessor 15 (CP15) provides system control functionality. This includes architecture and feature
identification, as well as control, status information and configuration support. The following
sections describe CP15:
— CP15 registers for a VMSA implementation on page B3-64
— CP15 registers for a PMSA implementation on page B4-22.
CP15 also provides performance monitor registers, see Chapter C9 Performance Monitors.
? Coprocessor 14 (CP14) supports:
— debug, see Chapter C6 Debug Register Interfaces
— the execution environment features defined by the architecture, see Execution environment
support on page A2-69.
? Coprocessor 11 (CP11) supports double-precision floating-point operations.
? Coprocessor 10 (CP10) supports single-precision floating-point operations and the control and
configuration of both the VFP and the Advanced SIMD architecture extensions.
? Coprocessors 8, 9, 12, and 13 are reserved for future use by ARM.


the synchronization primitives provided in the ARM and Thumb instruction sets are ldrex/strex:
ldrex 读取一个普通内存到寄存器,并且设置local monitor位exclusive access状态。
strex 将就存起值写回普通内存,并且设置local monitor位open access状态。如果写入成功通过寄存器返回0,失败为1。多个cpu同时对同一个地址exclusive access访问时就会失败。
使用注意:ldrex/strex需配对使用,中间指令要比较少,中间指令不能包含显示store指令,context切换后(中断,异常)需要执行clrex或者strex清楚exclusive access状态,需要同步的不同地址比如A和B为了性能原因A和B的地址最好相差2048偏移以上。



For each memory region, the most significant memory attribute specifies the memory type. There are three
mutually exclusive memory types:
? Normal
? Device
? Strongly-ordered.

ddr, sram, rom都是normal memory。
normal memory类型:

Outer Shareable
Cacheability, one of: a The Outer Shareable attribute qualifies the Shareable attribute for Normal memory regions and enables two levels of Normal memory sharing.

Inner Shareable
Cacheability, one of: a Intended to handle Normal memory that is shared between Non-cacheable several processors.

Nonshareable
Cacheability, one of: a Intended to handle Normal memory that is used by only a Non-cacheable single processor.

一般情况normal memory都是outer shareable的。
看一个普通的4KB page的页表描述符:
[31:12] = base address
[8:6] = tex
[2] = Bufferable
[3] = Cacheable
[1:0] = 1x 表示small page
一般的tex=1, b=1, c=1表示outsharable,writeback, write-allocate的normal memory的意思。


关于普通内存访问顺序模型的问题,内存同步:
http://bbs.chinaunix.net/thread-3593865-1-1.html


cache寄存器:
MRC p15,0,<Rt>,c0,c0,1 ; Read CP15 Cache Type Register
L1Ip bits L1 instruction cache indexing and tagging policy
00 Reserved
01 ASID-tagged Virtual Index, Virtual Tag (AIVIVT)
10 Virtual Index, Physical Tag (VIPT) , ok
11 Physical Index, Physical Tag (PIPT)

MRC p15,1,<Rt>,c0,c0,1 ; Read CP15 Cache Level ID Register
0 0 LoUU LoC LoUIS Ctype7 Ctype6 Ctype5 Ctype4 Ctype3 Ctype2 Ctype1
CtypeX value Meaning, cache implemented at this level
000 No cache
001 Instruction cache only
010 Data cache only
011 Separate instruction and data caches  , ok
100 Unified cache
101, 11X Reserved

MRC p15,2,<Rt>,c0,c0,0 ; Read Cache Size Selection Register
MCR p15,2,<Rt>,c0,c0,0 ; Write Cache Size Selection Register
Level, bits [3:1]
Cache level of required cache. Permitted values are from 0b000, indicating Level 1 cache,
to 0b110 indicating Level 7 cache.
InD, bit [0]
Instruction not Data bit. Permitted values are:
0 Data or unified cache
1 Instruction cache.

MRC p15,1,<Rt>,c0,c0,0 ; Read current CP15 Cache Size ID Register
WT,WB,RA,W,A,,NumSets, Associativity, LineSize
NumSets, bits [27:13]
(Number of sets in cache) - 1, therefore a value of 0 indicates 1 set in the cache. The number
of sets does not have to be a power of 2.
Associativity, bits [12:3]
(Associativity of cache) - 1, therefore a value of 0 indicates an associativity of 1. The
associativity does not have to be a power of 2.
LineSize, bits [2:0]
(Log2(Number of words in cache line)) -2. For example:
? For a line length of 4 words: Log2(4) = 2, LineSize entry = 0.
0xff, 3, 1 = 256 * 4 * (8*4) = 8K.

使用流程:
Identifying the cache resources in ARMv7
From ARMv7 the architecture defines support for multiple levels of cache, up to a maximum of seven levels. This means the process of identifying the cache resources available to the processor in an ARMv7 implementation is more complicated. To obtain this information:
1. Read the Cache Type Register to find the indexing and tagging policy used for the Level 1 instruction cache. This register also provides the size of the smallest cache lines used for the instruction caches, and for the data and unified caches. These values are used in cache maintenance operations.
2. Read the Cache Level ID Register to find what caches are implemented. The register includes seven Cache type fields, for cache levels 1 to 8. Scanning these fields, starting from Level 1, identifies the instruction, data or unified caches implemented at each level. This scan ends when it reaches a level at which no caches are defined. The Cache Level ID Register also provides the Level of Unification and the Level of Coherency for the cache implementation.
3. For each cache identified at stage 2:
? Write to the Cache Size Selection Register to select the required cache. A cache is identified by its level, and whether it is:
— an instruction cache
— a data or unified cache.
? Read the Cache Size ID Register to find details of the cache.

论坛徽章:
0
2 [报告]
发表于 2013-02-18 20:52 |只看该作者
本帖最后由 blake326 于 2013-02-25 23:12 编辑

cache maintainance
kernel icache,dcache的操作在cache-v7.S中实现的。
        .long        v7_flush_icache_all   //c7, c1, 0 invalidate all icache
        .long        v7_flush_kern_cache_all  //c7, c14, 2 clean and invalidate dcache by set/way. 按照上面的例子set=256,way=4, 所以flush all cache需要循环1024次。
        .long        v7_flush_user_cache_all  //same up
        .long        v7_flush_user_cache_range  //same up
        .long        v7_coherent_kern_range //c7, c5, 1 invalidate icache by pou, c7, c11, 1 clean dcahe by pou. 根据地址范围大小和cache line大小 需要循环 range+31/32次。
        .long        v7_coherent_user_range //same up
        .long        v7_flush_kern_dcache_area //c7, c14 clean and invalidate dache by pou,根据地址范围大小和cache line大小 需要循环 range+31/32次。 这个比较重要,文件系统经常会调用到。
        .long        v7_dma_map_area //根据方向决定inval还是clean。c7, c6, 1  invalidate dcache by pou,  c7, c10, 1 clean dcache by pou。kernel在dma传输之前需要调用。
        .long        v7_dma_unmap_area
        .long        v7_dma_flush_range

文件系统在写了一个page之后,一般要调用flush_dcache_page。
void flush_dcache_page(struct page *page)
{
        struct address_space *mapping;

        /*
         * The zero page is never written to, so never has any dirty
         * cache lines, and therefore never needs to be flushed.
         */
        if (page == ZERO_PAGE(0))
                return;

        mapping = page_mapping(page);

        if (!cache_ops_need_broadcast() &&
            mapping && !mapping_mapped(mapping))
                clear_bit(PG_dcache_clean, &page->flags);
        else {
                __flush_dcache_page(mapping, page);
                if (mapping && cache_is_vivt()) //v7 dcache默认是vipt noalasing
                        __flush_dcache_aliases(mapping, page);
                else if (mapping)
                        __flush_icache_all();
                set_bit(PG_dcache_clean, &page->flags);
        }
}
void __flush_dcache_page(struct address_space *mapping, struct page *page)
{
        /*
         * Writeback any data associated with the kernel mapping of this
         * page.  This ensures that data in the physical page is mutually
         * coherent with the kernels mapping.
         */
        if (!PageHighMem(page)) {
                __cpuc_flush_dcache_area(page_address(page), PAGE_SIZE);
        } else {
                void *addr = kmap_high_get(page);
                if (addr) {
                        __cpuc_flush_dcache_area(addr, PAGE_SIZE);
                        kunmap_high(page);
                } else if (cache_is_vipt()) {
                        /* unmapped pages might still be cached */
                        addr = kmap_atomic(page);
                        __cpuc_flush_dcache_area(addr, PAGE_SIZE);
                        kunmap_atomic(addr);
                }
        }

        /*
         * If this is a page cache page, and we have an aliasing VIPT cache,
         * we only need to do one flush - which would be at the relevant
         * userspace colour, which is congruent with page->index.
         */
        if (mapping && cache_is_vipt_aliasing()) //v7 dcache 默认是vipt noaliasing的。不像v7 icache如果line*set > 4KB,就是aliasing的。
                flush_pfn_alias(page_to_pfn(page),
                                page->index << PAGE_CACHE_SHIFT);
}
考虑page是一个文件的缓存,那么mapping是存在的。
首先在__flush_dcache_upage中通过__cpuc_flush_dcache_area clean and invalidate dache by pou.
由于mapping存在,page可能映射的用户进程的代码段,因此需要__flush_icache_all();

cache alasing, 就是同一个物理地址,在cache中存在多个cache line映射,如果修改了一个line,并需flush该line,并且invalidate其余的同样映射的line。
由于虚拟地址的存在,很容易产生cache alasing问题, 举例vipt来说,如果set * line = 8K, 而page=4K, 那么如果连续两个虚拟地址addr1, addr1+4K都映射的page1
那么addr1, addr1+4K会映射到不同的cache line,但是对应的物理地址是相同的。不过v7 dcache硬件能够自动处理vipt alasing问题了,但是icache仍然需要手动处理。
'
擦考:http://bbs.chinaunix.net/forum.php?mod=viewthread&tid=1972593

论坛徽章:
0
3 [报告]
发表于 2013-02-18 20:53 |只看该作者
zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz

论坛徽章:
0
4 [报告]
发表于 2013-02-18 20:54 |只看该作者
zzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzzz

论坛徽章:
0
5 [报告]
发表于 2013-02-18 20:55 |只看该作者
zzzzzzzzzzzzzzzzzzzzzzzz
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP