Interpreting iostat statistics

Undertaking on a recent case regarding storage performance issues, I’ve found that although I am familiar with the basics of iostat, I was not confident enough to provide more in-depth troubleshooting of the real world I/O problem. Consequently, this article was born.

Introduction

iostat is frequently used to diagnose performance issues with both local and remote storage devices. When invoking iostat from the command line, the important thing to note is that the first report provides statistics for the time since system startup, while any subsequent reports cover the time since previous report. This is why it is important to run iostat with specified interval over a given period in time when troubleshooting the real-time I/O problem.

What we are usually interested in is throughput, IOPS and latency.

I/O stack

It is important to keep in mind that applications are not performing I/O to the disks directly, rather they do so via a file system. And file systems work hard to prevent applications from suffering disk I/O latency directly, such as by using RAM to buffer writes, and to cache and prefetch reads.

On the write-side, the application may “dirty” buffers in the file system cache and consider the I/O completed, however the file system doesn’t perform the disk I/O until much later (seconds) by batching together dirty data and writing them in bulk.

iostat example

Let’s take a following example, where I am using extended stats (-x) and output format in kilobytes per second (-k) to produce a device report only (-d):

iostat -kx -d
Linux 2.6.32-279.1.1.el6.x86_64 (pawwa.in.rs)     19/02/13     _x86_64_    (1 CPU)

Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
xvdf              0.00     0.00    0.00    0.00     0.00     0.00     8.00     0.00    0.84   0.83   0.00
xvdj              0.00     0.07    0.01    2.75     0.23    11.29     8.34     0.01    4.11   0.53   0.15
xvde              0.00     0.08    0.02    0.54     0.28     2.47     9.94     0.01   12.59   1.96   0.11

Let’s break down the fields:

  • rrqm/s, wrqm/s — number of read/write requests merged per second that were queued to the device. Requests can be merged if they are contiguous, not super relevant to diagnosing performance issues
  • r/s, w/s — number of read/write requests completed per second (iops)
  • rkB/s, wkB/s — or number of kilobytes read/write from/to the device per second (throughput).
  • avgrq-sz — average request size, in sectors
  • avgqu-sz — average queue size, in requests (blocking)
  • await — average waiting time for requests (time in queue + service time), in milliseconds
  • svctm — average service time for I/O requests by the device, in milliseconds
  • %util — approx. % of device utilization

svctm

svctm is the time from when an I/O is submitted to the device until is is completed.

High value in svctm field suggest a lack of overall throughput, indicating that the system is overloaded with I/O operations. Important note here is that svctm combines both reads and writes.

It is also advisable to disregard high values in the svctm field for disks that have very low rates of activity (less than 5%). This is because the fsflush process can force up the average service time when synchronizing data on disk with what is in memory.

For example, good measure of the performance of the underlying Amazon EBS volume is svctm. In general, this number should be below 100ms, and it’s usually much below.  For read-dominated work loads, it is expected to see this number in the 10-20ms range and for write-dominated work loads, it could be as low as single-digits. If the user is trying to get more IOPS, it is suggested to provision a couple of EBS volumes and running them in a RAID 0 stripe set (or an LVM stripe set).

By the way,  the systat package web site states the following:

svctm

The average service time (in milliseconds) for I/O requests that were issued to the device. Warning! Do not trust this field any more. This field will be removed in a future sysstat version.

Here’s some info about that: http://www.xaprb.com/blog/2010/09/06/beware-of-svctm-in-linuxs-iostat/

await

await is the average time from when a request is put in the queue to when it is complete. It is a combination of the queue length and the average service time.  It’s usually more revealing to look at them separately. Obviously, this value is highly dependent on how much items are in the queue.

As soon as the service time jumps up, there are more requests coming in each second than being completed, the queues grow, and await shoots up.

In general, if you find that await show a dramatic amplification of increases in service time, it usually means that the device is just barely able to keep up with the input rate, so any slowdown causes massive queuing.

Another question to ask is what latencies are acceptable? Do you have a best-practice number that will fit all applications? No. Why? Because each application is different. You should not look at I/O stats and decide that you have a problem with your storage. You must first understand whether your application suffers from I/O wait and only then look at the storage performance.

r/s and w/s

These numbers tell you the rate at which the device is servicing I/O requests, but if they drop, it doesn’t really mean much.  They could drop because disk subsystem is having problems, or because the application is submitting few requests.

rkB/s and wkB/s

High values in rkB/s and wkB/s suggest an I/O bottleneck. Dividing this number by reads/writes per second gives you the average request size.

avgrq-sz

Computed as described above, this gives an idea on how random your I/O is. In general if this number is below 16 (16 * 512 bytes = 8KB), you are doing extremely random I/O. If this number is low (<50), you are going to be IOPS limited.  If it’s high (>100), you are likely to be bandwidth limited.

avgqu-sz

This indicates how many requests are queued waiting to be serviced.  The maximum number this can be is found in /sys/block/<device>/queue/nr_requests.  By default, the max is 128.  If you are seeing numbers approaching this level, it means that your application is making requests faster than disk subsystem can service them.

If the svctm looks good but your application still is running slower than you expect, the other numbers can help diagnose why.  If avgqu-sz gets big (>30), your application is submitting more requests per second than the volume can handle.  The solution for EBS for example is to stripe across multiple EBS volumes using LVM or RAID-0.

Caching

As file systems are using RAM-based cache and performing more asynchronous disk I/O, what the application experiences can vastly differ to what the disks are doing.

Since svctm and await combine reads and writes, on a server with RAID and battery backed write cache, reads and writes will have very different behavior (writes should complete (to the RAID write cache) in just about zero milliseconds, reads should take approximately the theoretical average time possible on the underlying disk subsystem).

Databases

The svctm statistic is most interesting for any heavy-write transactional database server, as it translates directly to the transaction commit time. Average queue size (avgqu-sz) is another popular metric in the DBA circles. Ideally, the queue size (avgqu-sz) for a single disk should be in single digits, which means that the underlying device is well matched to the I/O load generated by the application.

When discussing storage performance of many RDBMS applications, I/O latency is also one of the most important topics.

If the database system suffers from a storage I/O bottleneck you would have seen higher amount of IOPS and/or high throughput.

Calculations

What can you calculate from the iostat? For example, how volume is performing: IOPS @ N ms average latency, with a X:Y read:write mix.

A post on how iostat calculates its values:

Sources

  • http://blog.jcole.us/2007/05/08/on-iostat-disk-latency-iohist-onward/
  • http://dtrace.org/blogs/brendan/2011/05/11/file-system-latency-part-1/
  • https://forums.aws.amazon.com/thread.jspa?threadID=33228
  • http://www.mjmwired.net/kernel/Documentation/iostats.txt
  • http://www.theiostorm.com/whats-an-acceptable-io-latency/
  • https://forums.aws.amazon.com/message.jspa?messageID=154670
  • http://sebastien.godard.pagesperso-orange.fr/man_iostat.html
  • http://sysadmin1138.net/mt/blog/2010/03/know-your-io.shtml

Comments are closed.