diskstat_ - Munin wildcard plugin to monitor various values provided via
Linux 2.6 systems with extended block device statistics enabled.
This plugin displays nicer device-mapper device names if it is run as root, but it functions as needed without root privilege. To configure for running as root enter this in a plugin configuration file:
[diskstat_*] user root
Among the more self-describing or well-known values like
throughput (Bytes per second) there are a few which might need further introduction.
Linux provides a counter which increments in a millisecond-interval for as long as there are outstanding I/O requests. If this counter is close to 1000msec in a given 1 second timeframe the device is nearly 100% saturated. This plugin provides values averaged over a 5 minute time frame per default, so it can't catch short-lived saturations, but it'll give a nice trend for semi-uniform load patterns as they're expected in most server or multi-user environments.
Device IO Time takes the counter described under
Device Utilization and divides it by the number of I/Os that happened in the given time frame, resulting in an average time per I/O on the block-device level.
This value can give you a good comparison base amongst different controllers, storage subsystems and disks for similiar workloads.
These values describe the average time it takes between an application issuing a syscall resulting in a hit to a blockdevice to the syscall returning to the application.
The values are bound to be higher (at least for read requests) than the time it takes the device itself to fulfill the requests, since calling overhead, queuing times and probably a dozen other things are included in those times.
These are the values to watch out for when an user complains that
the disks are too slow!.
A non-exhaustive list:
The core logic of this script is based on the iostat tool of the sysstat package written and maintained by Sebastien Godard.
Documentation/iostats.txt in your Linux source tree for further information about the
numbers involved in this module.
http://www.westnet.com/~gsmith/content/linux-pdflush.htm has a nice writeup about the pdflush daemon.
#%# family=manual #%# capabilities=autoconf suggest
Does not work correctly with multiple Munin masters as it calculates averages between each time it is run. In such a case it can be run twice in the same second, this causes "division by zero" errors. If it is run two seconds apart the average it reports is over 2 seconds, not 5 minutes.
Michael Renner <email@example.com>