I was bored tonight so I wrote a wrapper for hddtemp for Nagios monitoring. I have a bit of a quirky setup for Nagios where I run the local system checks on remote systems via netcat, ipsvd, and a script to handle the query. This allows me to monitor remote drive space, current users, total processes, and current load. Using hddtemp, I can now monitor the temperature of the drives in those machines (which also gives me an idea of how hot/cold the server room itself is).
This may need some tweaking to work with other Nagios setups, but shouldn’t be too hard to adapt. One of these days I’ll do a writeup on my Nagios configuration. Anyways, the wrapper script is as follows. It could probably be optimized a bit more, but it works well enough. Wordpress doesn’t handle the indents very well, so keep that in mind.
#!/bin/sh
usage() {
echo "${0} -w [warn] -c [crit] [drives]"
}
if [ "${1}" == "-h" -o "${1}" == "--help" ]; then
usage
exit 0
fi
if [ "${1}" == "-w" ]; then
shift
warn="${1}"
shift
else
usage
exit 1
fi
if [ "${1}" == "-c" ]; then
shift
crit="${1}"
shift
else
usage
exit 1
fi
while [ "${1}" != "" ]; do
drives="${drives} ${1}"
shift
done
if [ "${drives}" == "" ]; then
usage
exit 1
fi
status=0
smsg=""
htemp=0
for drive in ${drives}; do
msg=""
stats=`/usr/local/sbin/hddtemp ${drive}`
model=`echo ${stats} | cut -d ':' -f 2`
temp=`echo ${stats} | cut -d ':' -f 3 | cut -d ' ' -f 2`
dev=`echo ${drive}|cut -d '/' -f 3`
if [ "${temp}" -ge "${warn}" ]; then
if [ "${status}" != "2" ]; then
status=1
fi
fi
if [ "${temp}" -ge "${crit}" ]; then
status=2
fi
if [ "${temp}" -gt "${htemp}" ]; then
htemp="${temp}"
fi
smsg="${smsg}${dev}=${temp}C; "
done
case "${status}" in
2)
wmsg="CRITICAL"
;;
1)
wmsg="WARN"
;;
0)
wmsg="OK"
;;
esac
echo "HDDTEMP ${wmsg} - ${smsg}|hddtemp=${htemp};${warn};${crit};0"
The output, in Nagios’ status view looks like:
HDDTEMP OK - hda=22C: sda=24C: sdb=24C:
It’s called as “hddtemp-mon -w 30 -c 35 /dev/hda /dev/sda /dev/sdb”.