smart_ - Munin wildcard-plugin to monitor S.M.A.R.T. attribute values through smartctl
Node with Python interpreter and smartmontools (http://smartmontools.sourceforge.net/) installed and in function.
To monitor a S.M.A.R.T device, create a link in the service directory of the munin-node named smart_<device>, which is pointing to this file.
ln -s /usr/share/munin/plugins/smart_ /etc/munin/plugins/smart_hda
...will monitor /dev/hda.
The plugin must be run under high privileged user root, to get access to the raw device.
So following minimal configuration in plugin-conf.d/munin-node is needed.
[smart_*] user root group disk
smartpath - Specify path to smartctl program (Default: /usr/sbin/smartctl) smartargs - Override '-a' argument passed to smartctl with '-A -i'+smartargs ignorestandby - Ignore the standby state of the drive and perform SMART query. Default: False
Parameters can be specified on a per-drive basis, eg:
[smart_hda] user root env.smartargs -H -c -l error -l selftest -l selective -d ata env.smartpath /usr/local/sbin/smartctl
In particular, for SATA drives, with older versions of smartctl:
[smart_sda] user root env.smartargs -d ata -a [smart_twa0-1] user root env.smartargs -H -l error -d 3ware,1 env.ignorestandby True [smart_twa0-2] user root env.smartargs -H -l error -d 3ware,2
If a device supports the Self-Monitoring, Analysis and Reporting Technology (S.M.A.R.T.) it offers readable access to the attribute table. There you find the raw value, a normalised value and a threshold (set by the vendor) for each attribute, that is supported by that device.
The meaning and handling of the raw value is a secret of the vendors embedded S.M.A.R.T.-Software on the disk. The only relevant info from our external view is the normalised value in comparison with the threshold. If the attributes value is equal or below the threshold, it signals its failure and the health status of the device will switch from passed to failed.
This plugin fetches the normalised values of all SMART-Attributes and draw a curve for each of them. It takes the vendors threshold as critical limit for the munin datafield. So you will see an alarm, if the value reaches the vendors threshold.
Looking at the graph: It is a bad sign, if the curve starts to curl or to meander. The more horizontal it runs, the better. Of course it is normal, that the temperatures curve swings a bit. But the others should stay steady on their level if everything is ok.
S.M.A.R.T. distinguishes between Pre-fail and Old-age Attributes. An old disk will have more curling curves because of degradation, especially for the Old-age Attributes. You should then backup more often, run more selftests and prepare the disks replacement.
Act directly, if a <Pre-Fail> Attribute goes below threshold. Immediately back-up your data and replace your hard disk drive. A failure may be imminent..
 Consult the smartmontools manpages to learn about offline tests and automated selftests with smartd. Only with both activated, the values of the SMART-Attributes reflect the all over state of the device.
Tutorials and articles about S.M.A.R.T. and smartmontools: http://smartmontools.sourceforge.net/doc.html#tutorials
#%# family=auto #%# capabilities=autoconf suggest
Fetches values if called without arguments:
E.g.: munin-run smart_hda
Prints plugins configuration.
E.g.: munin-run smart_hda config
Tries to find smartctl and outputs value 'yes' for success, 'no' if not.
It's used by munin-node-configure to see wether autoconfiguration is possible.
Outputs the list of device names, that it found plugged to the system.
munin-node-configure use this to build the service links for this wildcard-plugin.
(C) 2004-2009 Nicolas Stransky <Nico@stransky.cx>
(C) 2008 Gabriele Pohl <firstname.lastname@example.org> Reformated existent documentation to POD-Style, added section Interpretation to the documentation.