NAME

smart_ - Munin wildcard-plugin to monitor S.M.A.R.T. attribute values through smartctl

APPLICABLE SYSTEMS

Node with Python interpreter and smartmontools (http://smartmontools.sourceforge.net/) installed and in function.

CONFIGURATION

Create link in service directory

To monitor a S.M.A.R.T device, create a link in the service directory of the munin-node named smart_<device>, which is pointing to this file.

E.g.

ln -s /usr/share/munin/plugins/smart_ /etc/munin/plugins/smart_hda

...will monitor /dev/hda.

Grant privileges in munin-node

The plugin must be run under high privileged user root, to get access to the raw device.

So following minimal configuration in plugin-conf.d/munin-node is needed.

  [smart_*]
  user root
  group disk

Set Parameter if needed

  smartpath     - Specify path to smartctl program (Default: /usr/sbin/smartctl)
  smartargs     - Override '-a' argument passed to smartctl with '-A -i'+smartargs
  ignorestandby - Ignore the standby state of the drive and perform SMART query. Default: False

Parameters can be specified on a per-drive basis, eg:

  [smart_hda]
  user root
  env.smartargs -H -c -l error -l selftest -l selective -d ata
  env.smartpath /usr/local/sbin/smartctl

In particular, for SATA drives, with older versions of smartctl:

  [smart_sda]
  user root
  env.smartargs -d ata -a

  [smart_twa0-1]
  user root
  env.smartargs -H -l error -d 3ware,1
  env.ignorestandby True

  [smart_twa0-2]
  user root
  env.smartargs -H -l error -d 3ware,2

INTERPRETATION

If a device supports the Self-Monitoring, Analysis and Reporting Technology (S.M.A.R.T.) it offers readable access to the attribute table. There you find the raw value, a normalised value and a threshold (set by the vendor) for each attribute, that is supported by that device.

The meaning and handling of the raw value is a secret of the vendors embedded S.M.A.R.T.-Software on the disk. The only relevant info from our external view is the normalised value in comparison with the threshold. If the attributes value is equal or below the threshold, it signals its failure and the health status of the device will switch from passed to failed.

This plugin fetches the normalised values of all SMART-Attributes and draw a curve for each of them. It takes the vendors threshold as critical limit for the munin datafield. So you will see an alarm, if the value reaches the vendors threshold.

Looking at the graph: It is a bad sign, if the curve starts to curl or to meander. The more horizontal it runs, the better. Of course it is normal, that the temperatures curve swings a bit. But the others should stay steady on their level if everything is ok.

S.M.A.R.T. distinguishes between Pre-fail and Old-age Attributes. An old disk will have more curling curves because of degradation, especially for the Old-age Attributes. You should then backup more often, run more selftests[1] and prepare the disks replacement.

Act directly, if a <Pre-Fail> Attribute goes below threshold. Immediately back-up your data and replace your hard disk drive. A failure may be imminent..

[1] Consult the smartmontools manpages to learn about offline tests and automated selftests with smartd. Only with both activated, the values of the SMART-Attributes reflect the all over state of the device.

Tutorials and articles about S.M.A.R.T. and smartmontools: http://smartmontools.sourceforge.net/doc.html#tutorials

MAGIC MARKERS

 #%# family=auto
 #%# capabilities=autoconf suggest

CALL OPTIONS

none

Fetches values if called without arguments:

E.g.: munin-run smart_hda

config

Prints plugins configuration.

E.g.: munin-run smart_hda config

autoconf

Tries to find smartctl and outputs value 'yes' for success, 'no' if not.

It's used by munin-node-configure to see wether autoconfiguration is possible.

suggest

Outputs the list of device names, that it found plugged to the system.

munin-node-configure use this to build the service links for this wildcard-plugin.

VERSION

Version 2.1

BUGS

None known

AUTHOR

(C) 2004-2009 Nicolas Stransky <Nico@stransky.cx>

(C) 2008 Gabriele Pohl <contact@dipohl.de> Reformated existent documentation to POD-Style, added section Interpretation to the documentation.

LICENSE

GPLv2 (http://www.gnu.org/licenses/gpl-2.0.txt)