diskstats - Munin multigraph plugin to monitor various values provided via
Linux 2.6 systems with extended block device statistics enabled.
This plugin displays nicer device-mapper device names if it is run as root, but it functions as needed without root privilege. To configure for running as root enter this in a plugin configuration file:
[diskstats] user root
You can specify which devices should get monitored by the plugin via environment variables. The variables are mutually exclusive and should contain a comma-separated list of device names. Partial names (e.g. 'sd' or 'dm-') are okay.
[diskstats] env.include_only sda,sdb,cciss/c0d0
[diskstats] env.exclude sdc,VGroot/LVswap
LVM volumes can be filtered either by their canonical names or their internal device-mapper based names (e.g. 'dm-3', see dmsetup(8) for further information).
This plugin will increase the graph_width dynamically to accomodate longer-than-normal device names. You can disable this behavior by setting the trim_labels environment variable. Additionally, you can specify a fixed graph_width for the graphs.
[diskstats] # Set graph_width to 450, device names which are longer get trimmed env.trim_labels yes env.graph_width 450
Among the more self-describing or well-known values like
throughput (Bytes per second) there are a few which might need further introduction.
Linux provides a counter which increments in a millisecond-interval for as long as there are outstanding I/O requests. If this counter is close to 1000msec in a given 1 second timeframe the device is nearly 100% saturated. This plugin provides values averaged over a 5 minute time frame per default, so it can't catch short-lived saturations, but it'll give a nice trend for semi-uniform load patterns as they're expected in most server or multi-user environments.
Device IO Time takes the counter described under
Device Utilization and divides it by the number of I/Os that happened in the given time frame, resulting in an average time per I/O on the block-device level.
This value can give you a good comparison base amongst different controllers, storage subsystems and disks for similiar workloads.
These values describe the average time it takes between an application issuing a syscall resulting in a hit to a blockdevice to the syscall returning to the application.
The values are bound to be higher (at least for read requests) than the time it takes the device itself to fulfill the requests, since calling overhead, queuing times and probably a dozen other things are included in those times.
These are the values to watch out for when an user complains that
the disks are too slow!.
A non-exhaustive list:
The core logic of this script is based on the iostat tool of the sysstat package written and maintained by Sebastien Godard.
Documentation/iostats.txt in your Linux source tree for further information about the
numbers involved in this module.
http://www.westnet.com/~gsmith/content/linux-pdflush.htm has a nice writeup about the pdflush daemon.
#%# family=auto #%# capabilities=autoconf
Michael Renner <email@example.com>