NAME

diskstat_ - Munin wildcard plugin to monitor various values provided via /proc/diskstats

APPLICABLE SYSTEMS

Linux 2.6 systems with extended block device statistics enabled.

CONFIGURATION

None needed.

This plugin displays nicer device-mapper device names if it is run as root, but it functions as needed without root privilege. To configure for running as root enter this in a plugin configuration file:

  [diskstat_*]
    user root

INTERPRETATION

Among the more self-describing or well-known values like throughput (Bytes per second) there are a few which might need further introduction.

Device Utilization

Linux provides a counter which increments in a millisecond-interval for as long as there are outstanding I/O requests. If this counter is close to 1000msec in a given 1 second timeframe the device is nearly 100% saturated. This plugin provides values averaged over a 5 minute time frame per default, so it can't catch short-lived saturations, but it'll give a nice trend for semi-uniform load patterns as they're expected in most server or multi-user environments.

Device IO Time

The Device IO Time takes the counter described under Device Utilization and divides it by the number of I/Os that happened in the given time frame, resulting in an average time per I/O on the block-device level.

This value can give you a good comparison base amongst different controllers, storage subsystems and disks for similiar workloads.

Syscall Wait Time

These values describe the average time it takes between an application issuing a syscall resulting in a hit to a blockdevice to the syscall returning to the application.

The values are bound to be higher (at least for read requests) than the time it takes the device itself to fulfill the requests, since calling overhead, queuing times and probably a dozen other things are included in those times.

These are the values to watch out for when an user complains that the disks are too slow!.

What causes a block device hit?

A non-exhaustive list:

ACKNOWLEDGEMENTS

The core logic of this script is based on the iostat tool of the sysstat package written and maintained by Sebastien Godard.

SEE ALSO

See Documentation/iostats.txt in your Linux source tree for further information about the numbers involved in this module.

http://www.westnet.com/~gsmith/content/linux-pdflush.htm has a nice writeup about the pdflush daemon.

VERSION

  $Id$

MAGIC MARKERS

  #%# family=manual
  #%# capabilities=autoconf suggest

BUGS

Does not work correctly with multiple Munin masters as it calculates averages between each time it is run. In such a case it can be run twice in the same second, this causes "division by zero" errors. If it is run two seconds apart the average it reports is over 2 seconds, not 5 minutes.

AUTHOR

Michael Renner <michael.renner@amd.co.at>

LICENSE

GPLv2