- 论坛徽章:
- 0
|
How Solaris ZFS Cache Management Differs From UFS and VXFS File Systems [ID 1005367.1]
--------------------------------------------------------------------------------
修改时间 18-OCT-2011 类型 HOWTO 移植的 ID 207481 状态 PUBLISHED
Applies to:
Solaris SPARC Operating System - Version: 10 3/05 and later [Release: 10.0 and later ]
All Platforms
Goal
ZFS manages cache differently than other file systems such as: ufs and vxfs. ZFS use of kernel memory as a cache resulted in higher kernel memory allocation as compared to ufs and vxfs file systems. Monitoring a system with tools such as vmstat would report less free memory with ZFS and may lead to unnecessary support calls.
Solution
This is due to how ZFS cache management is different from ufs and vxfs file system.
ZFS does not use the page cache like other file systems such as vxfs and ufs. ZFS's caching is drastically different from these old file systems where cached pages can be moved to the cache list, after being written to the backing store, and thus counted as free memory.
ZFS effects the VM subsystem in terms of memory management. Monitoring of systems with vmstat(1M) and prstat(1M) would report less free memory when ZFS is used heavily ie, copying large file(s) into ZFS file system. Same load running on ufs file system would use less memory since pages that are written to backing store will be moved into a cache list and counted as free memory.
ZFS uses a significantly different caching model than page-based file systems like ufs and vxfs. This is done for both performance and architectural reasons.
This may impact existing application software (like oracle that likes to consume large amount of memory). There is a work being done to improve the interface with the VM subsystem so that it can recover memory from ZFS cache when necessary.
The primary ZFS cache is an Adjustable Replacement Cache (ARC) that is built on top of a number of kmem_cache's: zio_buf_512 thru zio_buf_131072 (+ hdr_cache and buf_cache). These kmem caches are used for holding data blocks (ZFS uses variable block sizes: 512 to128K). The ARC will be at least 64MB and can use maximum of 75% of physical memory. So with Zfs, reported freemem will be lower.
ZFS returns memory in ARC only when there is a memory pressure. It is a different behaviour than pre Solaris 8 (pre priority paging) where reading or writing one large (GBs) file can lead to memory shortage and that leads to paging/swaping of application page. That result in slow application performance and restart. The old problem was due to failure to distinguish between a useful application page and an file system cached page. See knowledge article 1003383.1
ZFS frees up it's cache in a way that does not cause memory shortage. System can operate with lower freemem fine without actually suffer from this. ZFS unlike ufs and vxfs file systems does not throttle writers. Ufs file system, throttles writes when number of dirty pages/vnode reaches 16MB. The objective is to preserve free memory. The downside is slow application write performance that may be unnecessary when plenty of free memory is available. ZFS does not throttled individual application like ufs and vxfs. ZFS only throttles the application when data load over flows the IO subsystem capacity for 5 to 10 second.
However there are occasions when ZFS fails to evict ARC cache quickly and leads to application startup failure due to memory shortage. Also, reaping of memory from ARC can trigger high system utilization at the expense of performance. This issue is addressed in bug:
<Defect 6505658> - target MRU size (arc.p) needs to be adjusted more aggressively
Workaround is to limit the ZFS arc cache by setting:
set zfs:zfs_arc_max
in /etc/system file. For databases, we know in advance, how much memory all databases will consume. Limit ZFS's ARC cache to remaining free memory (and possibly reduce it even more by 2-3x factor) by setting zfs_arc_max tunable to desired value.
This tunable is available in Solaris 10 8/07 (Update 4) KU 120011-14 installed along with fix for <Defect 6505658>.
Relief/Workaround
For Solaris 10 releases prior to 8/07 (Update 4) or without KU 120011-14 installed a script is provided in the <Defect 6505658>.
ZFS Best Practices Guide:
The ZFS adaptive replacement cache (ARC) tries to use most of a system's
available memory to cache file system data. The default is to use all of
physical memory except 1 Gbyte. As memory pressure increases, the ARC
relinquishes memory.
Consider limiting the maximum ARC memory footprint in the following
situations:
When a known amount of memory is always required by an application. Databases often fall into this category.
On platforms that support dynamic reconfiguration of memory
boards, to prevent ZFS from growing the kernel cage onto all boards.
A system that requires large memory pages might also benefit from
limiting the ZFS cache, which tends to breakdown large pages into base
pages.
Finally, if the system is running another non-ZFS file system, in
addition to ZFS, it is advisable to leave some free memory to host that
other file system's caches.
The trade off is that limiting this memory footprint means that the ARC is unable to cache as much file system data, and this limit could impact performance. In general, limiting the ARC is wasteful if the memory that now goes unused by ZFS is also unused by other system components. Note that non-ZFS file systems typically manage to cache data in what is nevertheless reported as free memory by the system.
|
|