- 论坛徽章:
- 0
|
Performance Management Guide
Monitoring Disk I/O
When you are monitoring disk I/O, use the following to determine your course of action:
- Find the most active files, file systems, and logical volumes:
- Can "hot" file systems be better located on the physical drive or be spread across multiple physical drives? (lslv, iostat, filemon)
- Are "hot" files local or remote? (filemon)
- Does paging space dominate disk utilization? (vmstat, filemon)
- Is there enough memory to cache the file pages being used by running processes? (vmstat, svmon, vmtune)
- Does the application perform a lot of synchronous (non-cached) file I/O?
- Determine file fragmentation:
- Are "hot" files heavily fragmented? (fileplace)
- Find the physical volume with the highest utilization:
- Is the type of drive or I/O adapter causing a bottleneck? (iostat, filemon)
Building a Pre-Tuning Baseline
Before you make significant changes in your disk configuration or tuning parameters, it is a good idea to build a baseline of measurements that record the current configuration and performance.
Wait I/O Time Reporting
AIX 4.3.3 and later contain enhancements to the method used to compute the percentage of CPU time spent waiting on disk I/O (wio time). The method used in AIX 4.3.2 and earlier versions of the operating system can, under certain circumstances, give an inflated view of wio time on SMPs. The wio time is reported by the commands sar (%wio), vmstat (wa) and iostat (% iowait).
Another change is that the wa column details the percentage of time the CPU was idle with pending disk I/O to not only local, but also NFS-mounted disks.
Method Used in AIX 4.3.2 and Earlier
At each clock interrupt on each processor (100 times a second per processor), a determination is made as to which of the four categories (usr/sys/wio/idle) to place the last 10 ms of time. If the CPU was busy in usr mode at the time of the clock interrupt, then usr gets the clock tick added into its category. If the CPU was busy in kernel mode at the time of the clock interrupt, then the sys category gets the tick. If the CPU was not busy, a check is made to see if any I/O to disk is in progress. If any disk I/O is in progress, the wio category is incremented. If no disk I/O is in progress and the CPU is not busy, the idle category gets the tick.
The inflated view of wio time results from all idle CPUs being categorized as wio regardless of the number of threads waiting on I/O. For example, systems with just one thread doing I/O could report over 90 percent wio time regardless of the number of CPUs it has.
Method Used in AIX 4.3.3 and Later
The change in AIX 4.3.3 is to only mark an idle CPU as wio if an outstanding I/O was started on that CPU. This method can report much lower wio times when just a few threads are doing I/O and the system is otherwise idle. For example, a system with four CPUs and one thread doing I/O will report a maximum of 25 percent wio time. A system with 12 CPUs and one thread doing I/O will report a maximum of 8.3 percent wio time.
Also, NFS now goes through the buffer cache, and waits in those routines are accounted for in the wa statistics.
Assessing Disk Performance with the iostat Command
Begin the assessment by running the iostat command with an interval parameter during your system's peak workload period or while running a critical application for which you need to minimize I/O delays. The following shell script runs the iostat command in the background while a copy of a large file runs in the foreground so that there is some I/O to measure:
# iostat 5 3 >io.out &
# cp big1 /dev/null
This example leaves the following three reports in the io.out file:
tty: tin tout avg-cpu: % user % sys % idle % iowait
0.0 1.3 0.2 0.6 98.9 0.3
Disks: % tm_act Kbps tps Kb_read Kb_wrtn
hdisk0 0.0 0.3 0.0 29753 48076
hdisk1 0.1 0.1 0.0 11971 26460
hdisk2 0.2 0.8 0.1 91200 108355
cd0 0.0 0.0 0.0 0 0
tty: tin tout avg-cpu: % user % sys % idle % iowait
0.8 0.8 0.6 9.7 50.2 39.5
Disks: % tm_act Kbps tps Kb_read Kb_wrtn
hdisk0 47.0 674.6 21.8 3376 24
hdisk1 1.2 2.4 0.6 0 12
hdisk2 4.0 7.9 1.8 8 32
cd0 0.0 0.0 0.0 0 0
tty: tin tout avg-cpu: % user % sys % idle % iowait
2.0 2.0 0.2 1.8 93.4 4.6
Disks: % tm_act Kbps tps Kb_read Kb_wrtn
hdisk0 0.0 0.0 0.0 0 0
hdisk1 0.0 0.0 0.0 0 0
hdisk2 4.8 12.8 3.2 64 0
cd0 0.0 0.0 0.0 0 0
The first report is the summary since the last reboot and shows the overall balance (or, in this case, imbalance) in the I/O to each of the hard disks. hdisk1 is almost idle and hdisk2 receives about 63 percent of the total I/O (from Kb_read and Kb_wrtn).
Note: The system maintains a history of disk activity. If the history is disabled (smitty chgsys -> Continuously maintain DISK I/O history [false]), the following message displays when you run the iostat command:
Disk history since boot not available.
The interval disk I/O statistics are unaffected by this.
The second report shows the 5-second interval during which cp ran. Examine this information carefully. The elapsed time for this cp was about 2.6 seconds. Thus, 2.5 seconds of high I/O dependency are being averaged with 2.5 seconds of idle time to yield the 39.5 percent % iowait reported. A shorter interval would have given a more accurate characterization of the command itself, but this example demonstrates what you must consider when you are looking at reports that show average activity across intervals.
TTY Report
The two columns of TTY information (tin and tout) in the iostat output show the number of characters read and written by all TTY devices. This includes both real and pseudo TTY devices. Real TTY devices are those connected to an asynchronous port. Some pseudo TTY devices are shells, telnet sessions, and aixterm windows.
Because the processing of input and output characters consumes CPU resources, look for a correlation between increased TTY activity and CPU utilization. If such a relationship exists, evaluate ways to improve the performance of the TTY subsystem. Steps that could be taken include changing the application program, modifying TTY port parameters during file transfer, or perhaps upgrading to a faster or more efficient asynchronous communications adapter.
In
Shell Script fastport.sh for Fast File Transfers
, you can find the fastport.sh script, which is intended to condition a TTY port for fast file transfers in raw mode; for example, when a FAX machine is to be connected. Using the script might improve CPU performance by a factor of 3 at 38400 baud.
CPU Report
The CPU statistics columns (% user, % sys, % idle, and % iowait) provide a breakdown of CPU usage. This information is also reported in the vmstat command output in the columns labeled us, sy, id, and wa. For a detailed explanation for the values, see
The vmstat Command
. Also note the change made to % iowait described in
Wait I/O Time Reporting
.
On systems running one application, high I/O wait percentage might be related to the workload. On systems with many processes, some will be running while others wait for I/O. In this case, the % iowait can be small or zero because running processes "hide" some wait time. Although % iowait is low, a bottleneck can still limit application performance.
If the iostat command indicates that a CPU-bound situation does not exist, and % iowait time is greater than 20 percent, you might have an I/O or disk-bound situation. This situation could be caused by excessive paging due to a lack of real memory. It could also be due to unbalanced disk load, fragmented data or usage patterns. For an unbalanced disk load, the same iostat report provides the necessary information. But for information about file systems or logical volumes, which are logical resources, you must use tools such as the filemon or fileplace commands.
Drive Report
When you suspect a disk I/O performance problem, use the iostat command. To avoid the information about the TTY and CPU statistics, use the -d option. In addition, the disk statistics can be limited to the important disks by specifying the disk names.
Remember that the first set of data represents all activity since system startup.
Disks:
Shows the names of the physical volumes. They are either hdisk or cd followed by a number. If physical volume names are specified with the iostat command, only those names specified are displayed.
% tm_act
Indicates the percentage of time that the physical disk was active (bandwidth utilization for the drive) or, in other words, the total time disk requests are outstanding. A drive is active during data transfer and command processing, such as seeking to a new location. The "disk active time" percentage is directly proportional to resource contention and inversely proportional to performance. As disk use increases, performance decreases and response time increases. In general, when the utilization exceeds 70 percent, processes are waiting longer than necessary for I/O to complete because most UNIX processes block (or sleep) while waiting for their I/O requests to complete. Look for busy versus idle drives. Moving data from busy to idle drives can help alleviate a disk bottleneck. Paging to and from disk will contribute to the I/O load.
Kbps
Indicates the amount of data transferred (read or written) to the drive in KB per second. This is the sum of Kb_read plus Kb_wrtn, divided by the seconds in the reporting interval.
tps
Indicates the number of transfers per second that were issued to the physical disk. A transfer is an I/O request through the device driver level to the physical disk. Multiple logical requests can be combined into a single I/O request to the disk. A transfer is of indeterminate size.
Kb_read
Reports the total data (in KB) read from the physical volume during the measured interval.
Kb_wrtn
Shows the amount of data (in KB) written to the physical volume during the measured interval.
Taken alone, there is no unacceptable value for any of the above fields because statistics are too closely related to application characteristics, system configuration, and type of physical disk drives and adapters. Therefore, when you are evaluating data, look for patterns and relationships. The most common relationship is between disk utilization (%tm_act) and data transfer rate (tps).
To draw any valid conclusions from this data, you have to understand the application's disk data access patterns such as sequential, random, or combination, as well as the type of physical disk drives and adapters on the system. For example, if an application reads/writes sequentially, you should expect a high disk transfer rate (Kbps) when you have a high disk busy rate (%tm_act). Columns Kb_read and Kb_wrtn can confirm an understanding of an application's read/write behavior. However, these columns provide no information on the data access patterns.
Generally you do not need to be concerned about a high disk busy rate (%tm_act) as long as the disk transfer rate (Kbps) is also high. However, if you get a high disk busy rate and a low disk transfer rate, you may have a fragmented logical volume, file system, or individual file.
Discussions of disk, logical volume and file system performance sometimes lead to the conclusion that the more drives you have on your system, the better the disk I/O performance. This is not always true because there is a limit to the amount of data that can be handled by a disk adapter. The disk adapter can also become a bottleneck. If all your disk drives are on one disk adapter, and your hot file systems are on separate physical volumes, you might benefit from using multiple disk adapters. Performance improvement will depend on the type of access.
To see if a particular adapter is saturated, use the iostat command and add up all the Kbps amounts for the disks attached to a particular disk adapter. For maximum aggregate performance, the total of the transfer rates (Kbps) must be below the disk adapter throughput rating. In most cases, use 70 percent of the throughput rate. In operating system versions later than 4.3.3 the -a or -A option will display this information.
Assessing Disk Performance with the vmstat Command
To prove that the system is I/O bound, it is better to use the iostat command. However, the vmstat command could point to that direction by looking at the wa column, as discussed in
The vmstat Command
. Other indicators for I/O bound are:
- The disk xfer part of the vmstat output
To display a statistic about the logical disks (a maximum of four disks is allowed), use the following command:
# vmstat hdisk0 hdisk1 1 8
kthr memory page faults cpu disk xfer
---- ---------- ----------------------- ------------ ----------- ------
r b avm fre re pi po fr sr cy in sy cs us sy id wa 1 2 3 4
0 0 3456 27743 0 0 0 0 0 0 131 149 28 0 1 99 0 0 0
0 0 3456 27743 0 0 0 0 0 0 131 77 30 0 1 99 0 0 0
1 0 3498 27152 0 0 0 0 0 0 153 1088 35 1 10 87 2 0 11
0 1 3499 26543 0 0 0 0 0 0 199 1530 38 1 19 0 80 0 59
0 1 3499 25406 0 0 0 0 0 0 187 2472 38 2 26 0 72 0 53
0 0 3456 24329 0 0 0 0 0 0 178 1301 37 2 12 20 66 0 42
0 0 3456 24329 0 0 0 0 0 0 124 58 19 0 0 99 0 0 0
0 0 3456 24329 0 0 0 0 0 0 123 58 23 0 0 99 0 0 0
The disk xfer part provides the number of transfers per second to the specified physical volumes that occurred in the sample interval. One to four physical volume names can be specified. Transfer statistics are given for each specified drive in the order specified. This count represents requests to the physical device. It does not imply an amount of data that was read or written. Several logical requests can be combined into one physical request.
- The in column of the vmstat output
This column shows the number of hardware or device interrupts (per second) observed over the measurement interval. Examples of interrupts are disk request completions and the 10 millisecond clock interrupt. Since the latter occurs 100 times per second, the in field is always greater than 100. But the vmstat command also provides a more detailed output about the system interrupts.
- The vmstat -i output
The -i parameter displays the number of interrupts taken by each device since system startup. But, by adding the interval and, optionally, the count parameter, the statistic since startup is only displayed in the first stanza; every trailing stanza is a statistic about the scanned interval.
# vmstat -i 1 2
priority level type count module(handler)
0 0 hardware 0 i_misc_pwr(a868c)
0 1 hardware 0 i_scu(a8680)
0 2 hardware 0 i_epow(954e0)
0 2 hardware 0 /etc/drivers/ascsiddpin(189acd4)
1 2 hardware 194 /etc/drivers/rsdd(1941354)
3 10 hardware 10589024 /etc/drivers/mpsdd(1977a88)
3 14 hardware 101947 /etc/drivers/ascsiddpin(189ab8c)
5 62 hardware 61336129 clock(952c4)
10 63 hardware 13769 i_softoff(9527c)
priority level type count module(handler)
0 0 hardware 0 i_misc_pwr(a868c)
0 1 hardware 0 i_scu(a8680)
0 2 hardware 0 i_epow(954e0)
0 2 hardware 0 /etc/drivers/ascsiddpin(189acd4)
1 2 hardware 0 /etc/drivers/rsdd(1941354)
3 10 hardware 25 /etc/drivers/mpsdd(1977a88)
3 14 hardware 0 /etc/drivers/ascsiddpin(189ab8c)
5 62 hardware 105 clock(952c4)
10 63 hardware 0 i_softoff(9527c)
Note: The output will differ from system to system, depending on hardware and software configurations (for example, the clock interrupts may not be displayed in the vmstat -i output although they will be accounted for under the in column in the normal vmstat output). Check for high numbers in the count column and investigate why this module has to execute so many interrupts.
Assessing Disk Performance with the sar Command
The sar command is a standard UNIX command used to gather statistical data about the system. With its numerous options, the sar command provides queuing, paging, TTY, and many other statistics. With AIX 4.3.3, the sar -d option generates real-time disk I/O statistics.
# sar -d 3 3
AIX konark 3 4 0002506F4C00 08/26/99
12:09:50 device %busy avque r+w/s blks/s avwait avserv
12:09:53 hdisk0 1 0.0 0 5 0.0 0.0
hdisk1 0 0.0 0 1 0.0 0.0
cd0 0 0.0 0 0 0.0 0.0
12:09:56 hdisk0 0 0.0 0 0 0.0 0.0
hdisk1 0 0.0 0 1 0.0 0.0
cd0 0 0.0 0 0 0.0 0.0
12:09:59 hdisk0 1 0.0 1 4 0.0 0.0
hdisk1 0 0.0 0 1 0.0 0.0
cd0 0 0.0 0 0 0.0 0.0
Average hdisk0 0 0.0 0 3 0.0 0.0
hdisk1 0 0.0 0 1 0.0 0.0
cd0 0 0.0 0 0 0.0 0.0
The fields listed by the sar -d command are as follows:
%busy
Portion of time device was busy servicing a transfer request. This is the same as the %tm_act column in the iostat command report.
avque
Average number of requests outstanding during that time. This number is a good indicator if an I/O bottleneck exists.
r+w/s
Number of read/write transfers from or to device. This is the same as tps in the iostat command report.
blks/s
Number of bytes transferred in 512-byte units
avwait
Average number of transactions waiting for service (queue length). Average time (in milliseconds) that transfer requests waited idly on queue for the device. This number is currently not reported and shows 0.0 by default.
avserv
Number of milliseconds per average seek. Average time (in milliseconds) to service each transfer request (includes seek, rotational latency, and data transfer times) for the device. This number is currently not reported and shows 0.0 by default.
Assessing Logical Volume Fragmentation with the lslv Command
The lslv command shows, among other information, the logical volume fragmentation. To check logical volume fragmentation, use the command lslv -l lvname, as follows:
# lslv -l hd2
hd2:/usr
PV COPIES IN BAND DISTRIBUTION
hdisk0 114:000:000 22% 000:042:026:000:046
The output of COPIES shows the logical volume hd2 has only one copy. The IN BAND shows how well the intrapolicy, an attribute of logical volumes, is followed. The higher the percentage, the better the allocation efficiency. Each logical volume has its own intrapolicy. If the operating system cannot meet this requirement, it chooses the best way to meet the requirements. In our example, there are a total of 114 logical partitions (LP); 42 LPs are located on middle, 26 LPs on center, and 46 LPs on inner-edge. Since the logical volume intrapolicy is center, the in-band is 22 percent (26 / (42+26+46). The DISTRIBUTION shows how the physical partitions are placed in each part of the intrapolicy; that is:
edge : middle : center : inner-middle : inner-edge
See
Position on Physical Volume
for additional information about physical partitions placement.
Assessing Physical Placement of Data with the lslv Command
If the workload shows a significant degree of I/O dependency, you can investigate the physical placement of the files on the disk to determine if reorganization at some level would yield an improvement. To see the placement of the partitions of logical volume hd11 within physical volume hdisk0, use the following:
# lslv -p hdisk0 hd11
hdisk0:hd11:/home/op
USED USED USED USED USED USED USED USED USED USED 1-10
USED USED USED USED USED USED USED 11-17
USED USED USED USED USED USED USED USED USED USED 18-27
USED USED USED USED USED USED USED 28-34
USED USED USED USED USED USED USED USED USED USED 35-44
USED USED USED USED USED USED 45-50
USED USED USED USED USED USED USED USED USED USED 51-60
0052 0053 0054 0055 0056 0057 0058 61-67
0059 0060 0061 0062 0063 0064 0065 0066 0067 0068 68-77
0069 0070 0071 0072 0073 0074 0075 78-84
Look for the rest of hd11 on hdisk1 with the following:
# lslv -p hdisk1 hd11
hdisk1:hd11:/home/op
0035 0036 0037 0038 0039 0040 0041 0042 0043 0044 1-10
0045 0046 0047 0048 0049 0050 0051 11-17
USED USED USED USED USED USED USED USED USED USED 18-27
USED USED USED USED USED USED USED 28-34
USED USED USED USED USED USED USED USED USED USED 35-44
USED USED USED USED USED USED 45-50
0001 0002 0003 0004 0005 0006 0007 0008 0009 0010 51-60
0011 0012 0013 0014 0015 0016 0017 61-67
0018 0019 0020 0021 0022 0023 0024 0025 0026 0027 68-77
0028 0029 0030 0031 0032 0033 0034 78-84
From top to bottom, five blocks represent edge, middle, center, inner-middle, and inner-edge, respectively.
- A USED indicates that the physical partition at this location is used by a logical volume other than the one specified. A number indicates the logical partition number of the logical volume specified with the lslv -p command.
- A FREE indicates that this physical partition is not used by any logical volume. Logical volume fragmentation occurs if logical partitions are not contiguous across the disk.
- A STALE physical partition is a physical partition that contains data you cannot use. You can also see the STALE physical partitions with the lspv -m command. Physical partitions marked as STALE must be updated to contain the same information as valid physical partitions. This process, called resynchronization with the syncvg command, can be done at vary-on time, or can be started anytime the system is running. Until the STALE partitions have been rewritten with valid data, they are not used to satisfy read requests, nor are they written to on write requests.
In the previous example, logical volume hd11 is fragmented within physical volume hdisk1, with its first logical partitions in the inner-middle and inner regions of hdisk1, while logical partitions 35-51 are in the outer region. A workload that accessed hd11 randomly would experience unnecessary I/O wait time as longer seeks might be needed on logical volume hd11. These reports also indicate that there are no free physical partitions in either hdisk0 or hdisk1.
Assessing File Placement with the fileplace Command
To see how the file copied earlier, big1, is stored on the disk, we can use the fileplace command. The fileplace command displays the placement of a file's blocks within a logical volume or within one or more physical volumes.
To determine whether the fileplace command is installed and available, run the following command:
# lslpp -lI perfagent.tools
Use the following command:
# fileplace -pv big1
File: big1 Size: 3554273 bytes Vol: /dev/hd10
Blk Size: 4096 Frag Size: 4096 Nfrags: 868 Compress: no
Inode: 19 Mode: -rwxr-xr-x Owner: hoetzel Group: system
Physical Addresses (mirror copy 1) Logical Fragment
---------------------------------- ----------------
0001584-0001591 hdisk0 8 frags 32768 Bytes, 0.9% 0001040-0001047
0001624-0001671 hdisk0 48 frags 196608 Bytes, 5.5% 0001080-0001127
0001728-0002539 hdisk0 812 frags 3325952 Bytes, 93.5% 0001184-0001995
868 frags over space of 956 frags: space efficiency = 90.8%
3 fragments out of 868 possible: sequentiality = 99.8%
This example shows that there is very little fragmentation within the file, and those are small gaps. We can therefore infer that the disk arrangement of big1 is not significantly affecting its sequential read-time. Further, given that a (recently created) 3.5 MB file encounters this little fragmentation, it appears that the file system in general has not become particularly fragmented.
Occasionally, portions of a file may not be mapped to any blocks in the volume. These areas are implicitly filled with zeroes by the file system. These areas show as unallocated logical blocks. A file that has these holes will show the file size to be a larger number of bytes than it actually occupies (that is, the ls -l command will show a large size, whereas the du command will show a smaller size or the number of blocks the file really occupies on disk).
The fileplace command reads the file's list of blocks from the logical volume. If the file is new, the information may not be on disk yet. Use the sync command to flush the information. Also, the fileplace command will not display NFS remote files (unless the command runs on the server).
Note: If a file has been created by seeking to various locations and writing widely dispersed records, only the pages that contain records will take up space on disk and appear on a fileplace report. The file system does not fill in the intervening pages automatically when the file is created. However, if such a file is read sequentially (by the cp or tar commands, for example) the space between records is read as binary zeroes. Thus, the output of such a cp command can be much larger than the input file, although the data is the same.
Space Efficiency and Sequentiality
Higher space efficiency means files are less fragmented and probably provide better sequential file access. A higher sequentiality indicates that the files are more contiguously allocated, and this will probably be better for sequential file access.
Space efficiency =
Total number of fragments used for file storage /
(Largest fragment physical address -
Smallest fragment physical address + 1)
Sequentiality =
(Total number of fragments -
Number of grouped fragments +1) /
Total number of fragments
If you find that your sequentiality or space efficiency values become low, you can use the reorgvg command to improve logical volume utilization and efficiency (see
Reorganizing Logical Volumes
). To improve file system utilization and efficiency, see
Reorganizing File Systems
.
In this example, the Largest fragment physical address - Smallest fragment physical address + 1 is: 0002539 - 0001584 + 1 = 956 fragments; total used fragments is: 8 + 48 + 812 = 868; the space efficiency is 868 / 956 (90.8 percent); the sequentiality is (868 - 3 + 1) / 868 = 99.8 percent.
Because the total number of fragments used for file storage does not include the indirect blocks location, but the physical address does, the space efficiency can never be 100 percent for files larger than 32 KB, even if the file is located on contiguous fragments.
Assessing Paging Space I/O with the vmstat Command
I/O to and from paging spaces is random, mostly one page at a time. The vmstat reports indicate the amount of paging-space I/O taking place. Both of the following examples show the paging activity that occurs during a C compilation in a machine that has been artificially shrunk using the rmss command. The pi and po (paging-space page-ins and paging-space page-outs) columns show the amount of paging-space I/O (in terms of 4096-byte pages) during each 5-second interval. The first report (summary since system reboot) has been removed. Notice that the paging activity occurs in bursts.
# vmstat 5 8
kthr memory page faults cpu
----- ----------- ------------------------ ------------ -----------
r b avm fre re pi po fr sr cy in sy cs us sy id wa
0 1 72379 434 0 0 0 0 2 0 376 192 478 9 3 87 1
0 1 72379 391 0 8 0 0 0 0 631 2967 775 10 1 83 6
0 1 72379 391 0 0 0 0 0 0 625 2672 790 5 3 92 0
0 1 72379 175 0 7 0 0 0 0 721 3215 868 8 4 72 16
2 1 71384 877 0 12 13 44 150 0 662 3049 853 7 12 40 41
0 2 71929 127 0 35 30 182 666 0 709 2838 977 15 13 0 71
0 1 71938 122 0 0 8 32 122 0 608 3332 787 10 4 75 11
0 1 71938 122 0 0 0 3 12 0 611 2834 733 5 3 75 17
The following "before and after" vmstat -s reports show the accumulation of paging activity. Remember that it is the paging space page ins and paging space page outs that represent true paging-space I/O. The (unqualified) page ins and page outs report total I/O, that is both paging-space I/O and the ordinary file I/O, performed by the paging mechanism. The reports have been edited to remove lines that are irrelevant to this discussion.
# vmstat -s # before
# vmstat -s # after
6602 page ins
3948 page outs
544 paging space page ins
1923 paging space page outs
0 total reclaims
7022 page ins
4146 page outs
689 paging space page ins
2032 paging space page outs
0 total reclaims
The fact that more paging-space page-ins than page-outs occurred during the compilation suggests that we had shrunk the system to the point that thrashing begins. Some pages were being repaged because their frames were stolen before their use was complete.
Assessing Overall Disk I/O with the vmstat Command
The technique just discussed can also be used to assess the disk I/O load generated by a program. If the system is otherwise idle, the following sequence:
# vmstat -s >statout
# testpgm
# sync
# vmstat -s >> statout
# egrep "ins|outs" statout
yields a before and after picture of the cumulative disk activity counts, such as:
5698 page ins
5012 page outs
0 paging space page ins
32 paging space page outs
6671 page ins
5268 page outs
8 paging space page ins
225 paging space page outs
During the period when this command (a large C compile) was running, the system read a total of 981 pages (8 from paging space) and wrote a total of 449 pages (193 to paging space).
Detailed I/O Analysis with the filemon Command
The filemon command uses the trace facility to obtain a detailed picture of I/O activity during a time interval on the various layers of file system utilization, including the logical file system, virtual memory segments, LVM, and physical disk layers. Data can be collected on all the layers, or layers can be specified with the -O layer option. The default is to collect data on the VM, LVM, and physical layers. Both summary and detailed reports are generated. Since it uses the trace facility, the filemon command can be run only by the root user or by a member of the system group.
To determine whether the filemon command is installed and available, run the following command:
# lslpp -lI perfagent.tools
Tracing is started by the filemon command, optionally suspended with the trcoff subcommand and resumed with the trcon subcomand, and terminated with the trcstop subcommand (you may want to issue the command nice -n -20 trcstop to stop the filemon command since the filemon command is currently running at priority 40). As soon as tracing is terminated, the filemon command writes its report to stdout.
Note: Only data for those files opened after the filemon command was started will be collected, unless you specify the -u flag.
The filemon command can read the I/O trace data from a specified file, instead of from the real-time trace process. In this case, the filemon report summarizes the I/O activity for the system and period represented by the trace file. This offline processing method is useful when it is necessary to postprocess a trace file from a remote machine or perform the trace data collection at one time and postprocess it at another time.
The trcrpt -r command must be executed on the trace logfile and redirected to another file, as follows:
# gennames > gennames.out
# trcrpt -r trace.out > trace.rpt
At this point an adjusted trace logfile is fed into the filemon command to report on I/O activity captured by a previously recorded trace session as follows:
# filemon -i trace.rpt -n gennames.out | pg
In this example, the filemon command reads file system trace events from the input file trace.rpt. Because the trace data is already captured on a file, the filemon command does not put itself in the background to allow application programs to be run. After the entire file is read, an I/O activity report for the virtual memory, logical volume, and physical volume levels is displayed on standard output (which, in this example, is piped to the pg command).
If the trace command was run with the -C all flag, then run the trcrpt command also with the -C all flag (see
Formatting a Report from trace -C Output
).
The following sequence of commands gives an example of the filemon command usage:
# filemon -o fm.out -O all; cp /smit.log /dev/null ; trcstop
The report produced by this sequence, in an otherwise-idle system, is as follows:
Thu Aug 19 11:30:49 1999
System: AIX texmex Node: 4 Machine: 000691854C00
0.369 secs in measured interval
Cpu utilization: 9.0%
Most Active Files
------------------------------------------------------------------------
#MBs #opns #rds #wrs file volume:inode
------------------------------------------------------------------------
0.1 1 14 0 smit.log /dev/hd4:858
0.0 1 0 13 null
0.0 2 4 0 ksh.cat /dev/hd2:16872
0.0 1 2 0 cmdtrace.cat /dev/hd2:16739
Most Active Segments
------------------------------------------------------------------------
#MBs #rpgs #wpgs segid segtype volume:inode
------------------------------------------------------------------------
0.1 13 0 5e93 ???
0.0 2 0 22ed ???
0.0 1 0 5c77 persistent
Most Active Logical Volumes
------------------------------------------------------------------------
util #rblk #wblk KB/s volume description
------------------------------------------------------------------------
0.06 112 0 151.9 /dev/hd4 /
0.04 16 0 21.7 /dev/hd2 /usr
Most Active Physical Volumes
------------------------------------------------------------------------
util #rblk #wblk KB/s volume description
------------------------------------------------------------------------
0.10 128 0 173.6 /dev/hdisk0 N/A
------------------------------------------------------------------------
Detailed File Stats
------------------------------------------------------------------------
FILE: /smit.log volume: /dev/hd4 (/) inode: 858
opens: 1
total bytes xfrd: 57344
reads: 14 (0 errs)
read sizes (bytes): avg 4096.0 min 4096 max 4096 sdev 0.0
read times (msec): avg 1.709 min 0.002 max 19.996 sdev 5.092
FILE: /dev/null
opens: 1
total bytes xfrd: 50600
writes: 13 (0 errs)
write sizes (bytes): avg 3892.3 min 1448 max 4096 sdev 705.6
write times (msec): avg 0.007 min 0.003 max 0.022 sdev 0.006
FILE: /usr/lib/nls/msg/en_US/ksh.cat volume: /dev/hd2 (/usr) inode: 16872
opens: 2
total bytes xfrd: 16384
reads: 4 (0 errs)
read sizes (bytes): avg 4096.0 min 4096 max 4096 sdev 0.0
read times (msec): avg 0.042 min 0.015 max 0.070 sdev 0.025
lseeks: 10
FILE: /usr/lib/nls/msg/en_US/cmdtrace.cat volume: /dev/hd2 (/usr) inode: 16739
opens: 1
total bytes xfrd: 8192
reads: 2 (0 errs)
read sizes (bytes): avg 4096.0 min 4096 max 4096 sdev 0.0
read times (msec): avg 0.062 min 0.049 max 0.075 sdev 0.013
lseeks: 8
------------------------------------------------------------------------
Detailed VM Segment Stats (4096 byte pages)
------------------------------------------------------------------------
SEGMENT: 5e93 segtype: ???
segment flags:
reads: 13 (0 errs)
read times (msec): avg 1.979 min 0.957 max 5.970 sdev 1.310
read sequences: 1
read seq. lengths: avg 13.0 min 13 max 13 sdev 0.0
SEGMENT: 22ed segtype: ???
segment flags: inode
reads: 2 (0 errs)
read times (msec): avg 8.102 min 7.786 max 8.418 sdev 0.316
read sequences: 2
read seq. lengths: avg 1.0 min 1 max 1 sdev 0.0
SEGMENT: 5c77 segtype: persistent
segment flags: pers defer
reads: 1 (0 errs)
read times (msec): avg 13.810 min 13.810 max 13.810 sdev 0.000
read sequences: 1
read seq. lengths: avg 1.0 min 1 max 1 sdev 0.0
------------------------------------------------------------------------
Detailed Logical Volume Stats (512 byte blocks)
------------------------------------------------------------------------
VOLUME: /dev/hd4 description: /
reads: 5 (0 errs)
read sizes (blks): avg 22.4 min 8 max 40 sdev 12.8
read times (msec): avg 4.847 min 0.938 max 13.792 sdev 4.819
read sequences: 3
read seq. lengths: avg 37.3 min 8 max 64 sdev 22.9
seeks: 3 (60.0%)
seek dist (blks): init 6344,
avg 40.0 min 8 max 72 sdev 32.0
time to next req(msec): avg 70.473 min 0.224 max 331.020 sdev 130.364
throughput: 151.9 KB/sec
utilization: 0.06
VOLUME: /dev/hd2 description: /usr
reads: 2 (0 errs)
read sizes (blks): avg 8.0 min 8 max 8 sdev 0.0
read times (msec): avg 8.078 min 7.769 max 8.387 sdev 0.309
read sequences: 2
read seq. lengths: avg 8.0 min 8 max 8 sdev 0.0
seeks: 2 (100.0%)
seek dist (blks): init 608672,
avg 16.0 min 16 max 16 sdev 0.0
time to next req(msec): avg 162.160 min 8.497 max 315.823 sdev 153.663
throughput: 21.7 KB/sec
utilization: 0.04
------------------------------------------------------------------------
Detailed Physical Volume Stats (512 byte blocks)
------------------------------------------------------------------------
VOLUME: /dev/hdisk0 description: N/A
reads: 7 (0 errs)
read sizes (blks): avg 18.3 min 8 max 40 sdev 12.6
read times (msec): avg 5.723 min 0.905 max 20.448 sdev 6.567
read sequences: 5
read seq. lengths: avg 25.6 min 8 max 64 sdev 22.9
seeks: 5 (71.4%)
seek dist (blks): init 4233888,
avg 171086.0 min 8 max 684248 sdev 296274.2
seek dist (%tot blks):init 48.03665,
avg 1.94110 min 0.00009 max 7.76331 sdev 3.36145
time to next req(msec): avg 50.340 min 0.226 max 315.865 sdev 108.483
throughput: 173.6 KB/sec
utilization: 0.10
Using the filemon command in systems with real workloads would result in much larger reports and might require more trace buffer space. Space and CPU time consumption for the filemon command can degrade system performance to some extent. Use a nonproduction system to experiment with the filemon command before starting it in a production environment. Also, use offline processing and on systems with many CPUs use the -C all flag with the trace command.
Note: Although the filemon command reports average, minimum, maximum, and standard deviation in its detailed-statistics sections, the results should not be used to develop confidence intervals or other formal statistical inferences. In general, the distribution of data points is neither random nor symmetrical.
Global Reports of the filemon Command
The global reports list the most active files, segments, logical volumes, and physical volumes during the measured interval. They are shown at the beginning of the filemon report. By default, the logical file and virtual memory reports are limited to the 20 most active files and segments, respectively, as measured by the total amount of data transferred. If the -v flag has been specified, activity for all files and segments is reported. All information in the reports is listed from top to bottom as most active to least active.
Most Active Files
#MBs
Total number of MBs transferred over measured interval for this file. The rows are sorted by this field in decreasing order.
#opns
Number of opens for files during measurement period.
#rds
Number of read calls to file.
#wrs
Number of write calls to file.
file
File name (full path name is in detailed report).
volume:inode
The logical volume that the file resides in and the i-node number of the file in the associated file system. This field can be used to associate a file with its corresponding persistent segment shown in the detailed VM segment reports. This field may be blank for temporary files created and deleted during execution.
The most active files are smit.log on logical volume hd4 and file null. The application utilizes the terminfo database for screen management; so the ksh.cat and cmdtrace.cat are also busy. Anytime the shell needs to post a message to the screen, it uses the catalogs for the source of the data.
To identify unknown files, you can translate the logical volume name, /dev/hd1, to the mount point of the file system, /home, and use the find or the ncheck command:
# find / -inum 858 -print
/smit.log
or
# ncheck -i 858 /
/:
858 /smit.log
Most Active Segments
#MBs
Total number of MBs transferred over measured interval for this segment. The rows are sorted by this field in decreasing order.
#rpgs
Number of 4-KB pages read into segment from disk.
#wpgs
Number of 4-KB pages written from segment to disk (page out).
#segid
VMM ID of memory segment.
segtype
Type of segment: working segment, persistent segment (local file), client segment (remote file), page table segment, system segment, or special persistent segments containing file system data (log, root directory, .inode, .inodemap, .inodex, .inodexmap, .indirect, .diskmap).
volume:inode
For persistent segments, name of logical volume that contains the associated file and the file's i-node number. This field can be used to associate a persistent segment with its corresponding file, shown in the Detailed File Stats reports. This field is blank for nonpersistent segments.
If the command is still active, the virtual memory analysis tool svmon can be used to display more information about a segment, given its segment ID (segid), as follows: svmon -D segid. See
The svmon Command
for a detailed discussion.
In our example, the segtype ??? means that the system cannot identify the segment type, and you must use the svmon command to get more information.
Most Active Logical Volumes
util
Utilization of logical volume.
#rblk
Number of 512-byte blocks read from logical volume.
#wblk
Number of 512-byte blocks written to logical volume.
KB/s
Average transfer data rate in KB per second.
volume
Logical volume name.
description
Either the file system mount point or the logical volume type (paging, jfslog, boot, or sysdump). For example, the logical volume /dev/hd2 is /usr; /dev/hd6 is paging, and /dev/hd8 is jfslog. There may also be the word compressed. This means all data is compressed automatically using Lempel-Zev (LZ) compression before being written to disk, and all data is uncompressed automatically when read from disk (see
Compression
for details).
The utilization is presented in percentage, 0.06 indicates 6 percent busy during measured interval.
Most Active Physical Volumes
util
Utilization of physical volume.
Note: Logical volume I/O requests start before and end after physical volume I/O requests. Total logical volume utilization will appear therefore to be higher than total physical volume utilization.
#rblk
Number of 512-byte blocks read from physical volume.
#wblk
Number of 512-byte blocks written to physical volume.
KB/s
Average transfer data rate in KB per second.
volume
Physical volume name.
description
Simple description of the physical volume type, for example, SCSI Multimedia CD-ROM Drive or 16 Bit SCSI Disk Drive.
The utilization is presented in percentage, 0.10 indicates 10 percent busy during measured interval.
Detailed Reports of the filemon Command
The detailed reports give additional information for the global reports. There is one entry for each reported file, segment, or volume in the detailed reports. The fields in each entry are described below for the four detailed reports. Some of the fields report a single value; others report statistics that characterize a distribution of many values. For example, response-time statistics are kept for all read or write requests that were monitored. The average, minimum, and maximum response times are reported, as well as the standard deviation of the response times. The standard deviation is used to show how much the individual response times deviated from the average. Approximately two-thirds of the sampled response times are between average minus standard deviation (avg - sdev) and average plus standard deviation (avg + sdev). If the distribution of response times is scattered over a large range, the standard deviation will be large compared to the average response time.
Detailed File Stats
Detailed file statistics are provided for each file listed in the Most Active Files report. These stanzas can be used to determine what access has been made to the file. In addition to the number of total bytes transferred, opens, reads, writes, and lseeks, the user can also determine the read/write size and times.
FILE
Name of the file. The full path name is given, if possible.
volume
Name of the logical volume/file system containing the file.
inode
I-node number for the file within its file system.
opens
Number of times the file was opened while monitored.
total bytes xfrd
Total number of bytes read/written from/to the file.
reads
Number of read calls against the file.
read sizes (bytes)
Read transfer-size statistics (avg/min/max/sdev), in bytes.
read times (msec)
Read response-time statistics (avg/min/max/sdev), in milliseconds.
writes
Number of write calls against the file.
write sizes (bytes)
Write transfer-size statistics.
write times (msec)
Write response-time statistics.
lseeks
Number of lseek() subroutine calls.
The read sizes and write sizes will give you an idea of how efficiently your application is reading and writing information. Use a multiple of 4 KB pages for best results.
Detailed VM Segment Stats
Each element listed in the Most Active Segments report has a corresponding stanza that shows detailed information about real I/O to and from memory.
SEGMENT
Internal operating system's segment ID.
segtype
Type of segment contents.
segment flags
Various segment attributes.
volume
For persistent segments, the name of the logical volume containing the corresponding file.
inode
For persistent segments, the i-node number for the corresponding file.
reads
Number of 4096-byte pages read into the segment (that is, paged in).
read times (msec)
Read response-time statistics (avg/min/max/sdev), in milliseconds.
read sequences
Number of read sequences. A sequence is a string of pages that are read (paged in) consecutively. The number of read sequences is an indicator of the amount of sequential access.
read seq. lengths
Statistics describing the lengths of the read sequences, in pages.
writes
Number of pages written from the segment to disk (that is, paged out).
write times (msec)
Write response-time statistics.
write sequences
Number of write sequences. A sequence is a string of pages that are written (paged out) consecutively.
write seq. lengths
Statistics describing the lengths of the write sequences, in pages.
By examining the reads and read-sequence counts, you can determine if the access is sequential or random. For example, if the read-sequence count approaches the reads count, the file access is more random. On the other hand, if the read-sequence count is significantly smaller than the read count and the read-sequence length is a high value, the file access is more sequential. The same logic applies for the writes and write sequence.
Detailed Logical/Physical Volume Stats
Each element listed in the Most Active Logical Volumes / Most Active Physical Volumes reports will have a corresponding stanza that shows detailed information about the logical/physical volume. In addition to the number of reads and writes, the user can also determine read and write times and sizes, as well as the initial and average seek distances for the logical / physical volume.
VOLUME
Name of the volume.
description
Description of the volume. (Describes contents, if dealing with a logical volume; describes type, if dealing with a physical volume.)
reads
Number of read requests made against the volume.
read sizes (blks)
Read transfer-size statistics (avg/min/max/sdev), in units of 512-byte blocks.
read times (msec)
Read response-time statistics (avg/min/max/sdev), in milliseconds.
read sequences
Number of read sequences. A sequence is a string of 512-byte blocks that are read consecutively. It indicates the amount of sequential access.
read seq. lengths
Statistics describing the lengths of the read sequences, in blocks.
writes
Number of write requests made against the volume.
write sizes (blks)
Write transfer-size statistics.
write times (msec)
Write-response time statistics.
write sequences
Number of write sequences. A sequence is a string of 512-byte blocks that are written consecutively.
write seq. lengths
Statistics describing the lengths of the write sequences, in blocks.
seeks
Number of seeks that preceded a read or write request; also expressed as a percentage of the total reads and writes that required seeks.
seek dist (blks)
Seek-distance statistics in units of 512-byte blocks. In addition to the usual statistics (avg/min/max/sdev), the distance of the initial seek operation (assuming block 0 was the starting position) is reported separately. This seek distance is sometimes very large; it is reported separately to avoid skewing the other statistics.
seek dist (cyls)
(Physical volume only) Seek-distance statistics in units of disk cylinders.
time to next req
Statistics (avg/min/max/sdev) describing the length of time, in milliseconds, between consecutive read or write requests to the volume. This column indicates the rate at which the volume is being accessed.
throughput
Total volume throughput in KB per second.
utilization
Fraction of time the volume was busy. The entries in this report are sorted by this field in decreasing order.
A long seek time can increase I/O response time and result in decreased application performance. By examining the reads and read sequence counts, you can determine if the access is sequential or random. The same logic applies to the writes and write sequence.
Guidelines for Using the filemon Command
Following are some guidelines for using the filemon command:
- The /etc/inittab file is always very active. Daemons specified in /etc/inittab are checked regularly to determine whether they are required to be respawned.
- The /etc/passwd file is also always very active. Because files and directories access permissions are checked.
- A long seek time increases I/O response time and decreases performance.
- If the majority of the reads and writes require seeks, you might have fragmented files and overly active file systems on the same physical disk. However, for online transaction processing (OLTP) or database systems this behavior might be normal.
- If the number of reads and writes approaches the number of sequences, physical disk access is more random than sequential. Sequences are strings of pages that are read (paged in) or written (paged out) consecutively. The seq. lengths is the length, in pages, of the sequences. A random file access can also involve many seeks. In this case, you cannot distinguish from the filemon output if the file access is random or if the file is fragmented. Use the fileplace command to investigate further.
- Remote files are listed in the volume:inode column with the remote system name.
Because the filemon command can potentially consume some CPU power, use this tool with discretion, and analyze the system performance while taking into consideration the overhead involved in running the tool. Tests have shown that in a CPU-saturated environment:
- With little I/O, the filemon command slowed a large compile by about one percent.
- With a high disk-output rate, the filemon command slowed the writing program by about five percent.
Summary for Monitoring Disk I/O
In general, a high % iowait indicates that the system has an application problem, a memory shortage, or an inefficient I/O subsystem configuration. For example, the application problem might be due to requesting a lot of I/O, but not doing much with the data. Understanding the I/O bottleneck and improving the efficiency of the I/O subsystem is the key in solving this bottleneck. Disk sensitivity can come in a number of forms, with different resolutions. Some typical solutions might include:
- Limiting number of active logical volumes and file systems placed on a particular physical disk. The idea is to balance file I/O evenly across all physical disk drives.
- Spreading a logical volume across multiple physical disks. This is particularly useful when a number of different files are being accessed.
- Creating multiple Journaled File Systems (JFS) logs for a volume group and assigning them to specific file systems (preferably on fast write cache devices). This is beneficial for applications that create, delete, or modify a large number of files, particularly temporary files.
- If the iostat output indicates that your workload I/O activity is not evenly distributed among the system disk drives, and the utilization of one or more disk drives is often 70-80 percent or more, consider reorganizing file systems, such as backing up and restoring file systems to reduce fragmentation. Fragmentation causes the drive to seek excessively and can be a large portion of overall response time.
- If large, I/O-intensive background jobs are interfering with interactive response time, you may want to activate I/O pacing.
- If it appears that a small number of files are being read over and over again, consider whether additional real memory would allow those files to be buffered more effectively.
- If the workload's access pattern is predominantly random, you might consider adding disks and distributing the randomly accessed files across more drives.
- If the workload's access pattern is predominantly sequential and involves multiple disk drives, you might consider adding one or more disk adapters. It may also be appropriate to consider building a striped logical volume to accommodate large, performance-critical sequential files.
- Using fast write cache devices.
- Using synchronous writes.
Each technique is discussed later in this chapter.
本文来自ChinaUnix博客,如果查看原文请点:http://blog.chinaunix.net/u/2984/showart_102787.html |
|