NAME

diskstats - Munin multigraph plugin to monitor various values provided via /proc/diskstats or /sys/block/*/stat

APPLICABLE SYSTEMS

Linux 2.6 systems with extended block device statistics enabled.

CONFIGURATION

None needed.

device-mapper names

This plugin displays nicer device-mapper device names if it is run as root, but it functions as needed without root privilege. To configure for running as root enter this in a plugin configuration file:

  [diskstats]
    user root

Monitor specific devices

You can specify which devices should get monitored by the plugin via environment variables. The variables are mutually exclusive and should contain a comma-separated list of device names. Partial names (e.g. 'sd' or 'dm-') are okay.

  [diskstats]
    env.include_only sda,sdb,cciss/c0d0

or

  [diskstats]
    env.exclude sdc,VGroot/LVswap

LVM volumes can be filtered either by their canonical names or their internal device-mapper based names (e.g. 'dm-3', see dmsetup(8) for further information).

Graph width and labels

This plugin will increase the graph_width dynamically to accomodate longer-than-normal device names. You can disable this behavior by setting the trim_labels environment variable. Additionally, you can specify a fixed graph_width for the graphs.

  [diskstats]
    # Set graph_width to 450, device names which are longer get trimmed
    env.trim_labels yes
    env.graph_width 450

INTERPRETATION

Among the more self-describing or well-known values like throughput (Bytes per second) there are a few which might need further introduction.

Device Utilization

Linux provides a counter which increments in a millisecond-interval for as long as there are outstanding I/O requests. If this counter is close to 1000msec in a given 1 second timeframe the device is nearly 100% saturated. This plugin provides values averaged over a 5 minute time frame per default, so it can't catch short-lived saturations, but it'll give a nice trend for semi-uniform load patterns as they're expected in most server or multi-user environments.

Device IO Time

The Device IO Time takes the counter described under Device Utilization and divides it by the number of I/Os that happened in the given time frame, resulting in an average time per I/O on the block-device level.

This value can give you a good comparison base amongst different controllers, storage subsystems and disks for similiar workloads.

Syscall Wait Time

These values describe the average time it takes between an application issuing a syscall resulting in a hit to a blockdevice to the syscall returning to the application.

The values are bound to be higher (at least for read requests) than the time it takes the device itself to fulfill the requests, since calling overhead, queuing times and probably a dozen other things are included in those times.

These are the values to watch out for when an user complains that the disks are too slow!.

What causes a block device hit?

A non-exhaustive list:

ACKNOWLEDGEMENTS

The core logic of this script is based on the iostat tool of the sysstat package written and maintained by Sebastien Godard.

SEE ALSO

See Documentation/iostats.txt in your Linux source tree for further information about the numbers involved in this module.

http://www.westnet.com/~gsmith/content/linux-pdflush.htm has a nice writeup about the pdflush daemon.

VERSION

  $Id$

MAGIC MARKERS

  #%# family=auto
  #%# capabilities=autoconf

AUTHOR

Michael Renner <michael.renner@amd.co.at>

LICENSE

GPLv2