免费注册 查看新帖 |

Chinaunix

  平台 论坛 博客 文库
最近访问板块 发新帖
查看: 3510 | 回复: 7
打印 上一主题 下一主题

[内存管理] 内存hotplug支持问题 [复制链接]

论坛徽章:
16
2015亚冠之吉达阿赫利
日期:2015-08-17 11:21:462015年迎新春徽章
日期:2015-03-04 09:58:11酉鸡
日期:2014-12-07 09:06:19水瓶座
日期:2014-11-04 14:23:29天秤座
日期:2014-03-02 08:57:52双鱼座
日期:2014-02-22 13:07:56午马
日期:2014-02-14 11:08:18双鱼座
日期:2014-02-13 11:09:37卯兔
日期:2014-02-06 15:10:34子鼠
日期:2014-01-20 14:48:19戌狗
日期:2013-12-19 09:37:46射手座
日期:2013-12-19 09:33:47
1 [报告]
发表于 2013-12-18 17:50 |显示全部楼层
hardware memory RAS feauture: memory mirror

论坛徽章:
16
2015亚冠之吉达阿赫利
日期:2015-08-17 11:21:462015年迎新春徽章
日期:2015-03-04 09:58:11酉鸡
日期:2014-12-07 09:06:19水瓶座
日期:2014-11-04 14:23:29天秤座
日期:2014-03-02 08:57:52双鱼座
日期:2014-02-22 13:07:56午马
日期:2014-02-14 11:08:18双鱼座
日期:2014-02-13 11:09:37卯兔
日期:2014-02-06 15:10:34子鼠
日期:2014-01-20 14:48:19戌狗
日期:2013-12-19 09:37:46射手座
日期:2013-12-19 09:33:47
2 [报告]
发表于 2013-12-19 09:14 |显示全部楼层
回复 3# humjb_1983

OS部分:

ACPI + memory management


   

论坛徽章:
16
2015亚冠之吉达阿赫利
日期:2015-08-17 11:21:462015年迎新春徽章
日期:2015-03-04 09:58:11酉鸡
日期:2014-12-07 09:06:19水瓶座
日期:2014-11-04 14:23:29天秤座
日期:2014-03-02 08:57:52双鱼座
日期:2014-02-22 13:07:56午马
日期:2014-02-14 11:08:18双鱼座
日期:2014-02-13 11:09:37卯兔
日期:2014-02-06 15:10:34子鼠
日期:2014-01-20 14:48:19戌狗
日期:2013-12-19 09:37:46射手座
日期:2013-12-19 09:33:47
3 [报告]
发表于 2013-12-19 09:33 |显示全部楼层
回复 5# humjb_1983

In high-level, here is how ACPI memory hotplug works:

hot-add:

1. ACPI sends a hotplug event to a new ACPI memory device object that is
hot-added.
2. The kernel is notified, and verifies if the new memory device object
has not been attached by any handler yet.
3. The memory handler is called, and obtains a new memory range from the
ACPI memory device object.
4. The memory handler calls add_memory() with the new address range.

The above step 1-4 proceeds automatically within the kernel.  No user
input (nor sysfs interface) is necessary.  Step 2 prevents double adds
and step 3 gets a valid address range from the firmware directly.  Step
4 is basically the same as the "probe" interface, but with all the
verification up front, this step is safe.

hot-remove:

1. ACPI sends a hotplug event to an ACPI memory device object that is
requested to hot-remove.
2. The kernel is notified, and verifies if the memory device object is
attached by a handler.
3. The memory handler is called (which is being attached), and obtains
its memory range.
4. The memory handler calls remove_memory() with the address range.
5. The kernel calls eject method of the ACPI memory device object.


   

论坛徽章:
16
2015亚冠之吉达阿赫利
日期:2015-08-17 11:21:462015年迎新春徽章
日期:2015-03-04 09:58:11酉鸡
日期:2014-12-07 09:06:19水瓶座
日期:2014-11-04 14:23:29天秤座
日期:2014-03-02 08:57:52双鱼座
日期:2014-02-22 13:07:56午马
日期:2014-02-14 11:08:18双鱼座
日期:2014-02-13 11:09:37卯兔
日期:2014-02-06 15:10:34子鼠
日期:2014-01-20 14:48:19戌狗
日期:2013-12-19 09:37:46射手座
日期:2013-12-19 09:33:47
4 [报告]
发表于 2013-12-19 09:36 |显示全部楼层
本帖最后由 embeddedlwp 于 2013-12-19 09:39 编辑

回复 5# humjb_1983

The memory hotplug add memory consists two steps: online and add
add —— alloc memmap, if we using vmemmap, we also need alloc pagetables for it.
Online ——
The Memory hotplug remove memory consists two steps: offline and remove
offline —— Offline just migrates user pages to somewhere else, and isolates the pages from buddy
system so that no one can use them any more.
remove —— Remove will free kernel direct mapping pagetables, and memmap. If we are using
vmemmap, we also free the pagetables of vmemmap.
lock_memory_hotplug() will protect logical and physical add/removy memory


Logical add memory
lock_memory_hotplug()
Send memory hotplug event MEM_GOING_ONLINE
If zone is not present, setup zone's pcp(next step free pages to buddy will need it)
Online all pages, actually is clear PG_reserved flag, just like early allocator free
pages to buddy system
Update zone's field, manged_pages, present_pages and node's present pages
If zone is not present before, build all zonelists in order that other zone can fallback
to this zone, otherwise update zone's pcp since pcp's batch is acculated according
to zone's present pages and this value change during online pages
update watermark, lowmem reservation, inactive ratio
If the node is not present before, create kswapd
Recaculate total pages
Send memory hotplug event MEM_ONLINE
unlock_memory_hotplug()


Logical remove memory
lock_memory_hotplug()
Making page-allocation-type to be MIGRATE_ISOLATE in order that free pages in the
range will never be allocated.
Send memory hotplug event MEM_GOING_OFFLINE
Memory offline code does 5 times of retry with 120 seconds timeout
Drain pagevec and pageset
Scan range and migrate pages if they are on lru
Test if all pageblock are MIGRATE_ISOLATE type
Offline pages, actually is set PG_reserved flag, then buddy system can't use them
Rechange all pageblocks to MIGRATE_MOVABLE
Update watermark, lowmem reservation, inactive ratio
Build all zonelists f zone is not present any more, otherwise update zone's pcp
If node is not present any more, stop kswapd
Update total memory
unlock_memory_hotplug()


Physical add memory
lock_memory_hotplug()
If node is not present before, hotadd_new_pgdat
Add memory section one by one
Create section decriptor mem_section, add memmap, usemap(bitmap for pageblock migration
type), alloc pagetables and populate them if config sparse-vmemmap
Grow zone/pgdat span
Set PG_reserved flag
Register memory section one by one to memory block
Add firmware memmap entry
unlock_memory_hotplug()


Physical remove memory
Offline memory blocks in range one by one
lock_memory_hotplug()
Check if all memory blocks are offline(logic remove)
Remove firmware memmap entry
Remove memory
Remove sections from memory block one by one
Free memmap, usemap, free pagetables if config sparse-vmemmap
Try to offline node if all cpu/memory are offline
unlock_memory_hotplug()
   

论坛徽章:
16
2015亚冠之吉达阿赫利
日期:2015-08-17 11:21:462015年迎新春徽章
日期:2015-03-04 09:58:11酉鸡
日期:2014-12-07 09:06:19水瓶座
日期:2014-11-04 14:23:29天秤座
日期:2014-03-02 08:57:52双鱼座
日期:2014-02-22 13:07:56午马
日期:2014-02-14 11:08:18双鱼座
日期:2014-02-13 11:09:37卯兔
日期:2014-02-06 15:10:34子鼠
日期:2014-01-20 14:48:19戌狗
日期:2013-12-19 09:37:46射手座
日期:2013-12-19 09:33:47
5 [报告]
发表于 2013-12-19 09:37 |显示全部楼层
回复 5# humjb_1983

Now some severs support a hardware memory RAS feature called memory
mirror, something like RAID1. The mirrored memory devices will be configured
with the same address and host same contents. And you could transparently
hot-remove one of the mirrored memory device without any help from OS.

We could think memory migration as an extension to the memory mirror technology.
The basic flow for memory migration is:
1) Find a spare memory device with enough capacity in the system.
2) OS issues a request to firmware to migrate from source memory device (A)
   to the spare memory device (B).
3) Firmware configures A and B into memory mode, and configure A as master
   and B as slave.
4) Firmware resilver the mirror to synchronize the content from A to B
5) Firmware reconfigure B as master and A as slave.
6) Firmware deconfigures the memory mirror and removes A
7) Firmware report results to OS.
Now user could hot-remove the source memory device A from system.

During memory migration, A and B are in mirror mode, so CPUs and IO devices
could access it as normal. After memory migration, memory device B will have
the same address ranges and content as memory device A, so there's no OS
visible changes except latency (because A and B may belong to different NUMA
domains).

So hardware memory migration could be used to migrate pages can't be migrated
by OS.


   

论坛徽章:
16
2015亚冠之吉达阿赫利
日期:2015-08-17 11:21:462015年迎新春徽章
日期:2015-03-04 09:58:11酉鸡
日期:2014-12-07 09:06:19水瓶座
日期:2014-11-04 14:23:29天秤座
日期:2014-03-02 08:57:52双鱼座
日期:2014-02-22 13:07:56午马
日期:2014-02-14 11:08:18双鱼座
日期:2014-02-13 11:09:37卯兔
日期:2014-02-06 15:10:34子鼠
日期:2014-01-20 14:48:19戌狗
日期:2013-12-19 09:37:46射手座
日期:2013-12-19 09:33:47
6 [报告]
发表于 2013-12-19 10:54 |显示全部楼层
回复 9# 瀚海书香

应该是可软件触发,可硬件触发

   

论坛徽章:
16
2015亚冠之吉达阿赫利
日期:2015-08-17 11:21:462015年迎新春徽章
日期:2015-03-04 09:58:11酉鸡
日期:2014-12-07 09:06:19水瓶座
日期:2014-11-04 14:23:29天秤座
日期:2014-03-02 08:57:52双鱼座
日期:2014-02-22 13:07:56午马
日期:2014-02-14 11:08:18双鱼座
日期:2014-02-13 11:09:37卯兔
日期:2014-02-06 15:10:34子鼠
日期:2014-01-20 14:48:19戌狗
日期:2013-12-19 09:37:46射手座
日期:2013-12-19 09:33:47
7 [报告]
发表于 2013-12-19 11:05 |显示全部楼层
本帖最后由 embeddedlwp 于 2013-12-19 11:07 编辑

Arrange hotpluggable memory as ZONE_MOVABLE
这个patchset 在缓解这个问题

内核占用的pages不能迁移
huawei通过类似memory mirror的方法实现貌似

memory mirror和memory hotplug实际是两套独立的方案吧?
我的理解是不能说是两套方案,8楼的方法,也就是extend memory mirror就是解决内核占用的pages不能迁移。

论坛徽章:
16
2015亚冠之吉达阿赫利
日期:2015-08-17 11:21:462015年迎新春徽章
日期:2015-03-04 09:58:11酉鸡
日期:2014-12-07 09:06:19水瓶座
日期:2014-11-04 14:23:29天秤座
日期:2014-03-02 08:57:52双鱼座
日期:2014-02-22 13:07:56午马
日期:2014-02-14 11:08:18双鱼座
日期:2014-02-13 11:09:37卯兔
日期:2014-02-06 15:10:34子鼠
日期:2014-01-20 14:48:19戌狗
日期:2013-12-19 09:37:46射手座
日期:2013-12-19 09:33:47
8 [报告]
发表于 2013-12-19 11:37 |显示全部楼层
回复 15# humjb_1983

没资料,都是自己读代码。


   
您需要登录后才可以回帖 登录 | 注册

本版积分规则 发表回复

  

北京盛拓优讯信息技术有限公司. 版权所有 京ICP备16024965号-6 北京市公安局海淀分局网监中心备案编号:11010802020122 niuxiaotong@pcpop.com 17352615567
未成年举报专区
中国互联网协会会员  联系我们:huangweiwei@itpub.net
感谢所有关心和支持过ChinaUnix的朋友们 转载本站内容请注明原作者名及出处

清除 Cookies - ChinaUnix - Archiver - WAP - TOP