With monit services are restarted, ever since I’ve installed zabbix I wanted the same functionallity. Turns out this is possible, but it takes some configuration.

Also see the zabbix manual, where it gets interesting from page 160 onwards.

In zabbix go to Configuration->Actions.

Add a new ‘Action Operation’ in which you want to run a remote command.

  1. Operation type: remote command
  2. Remote command: host:script in my (test) case elektron:/home/miekg/bin/zabbix_service {TRIGGER.NAME}: {STATUS}

And zabbix_service is now a shell script which will echo its arguments to a file in /tmp.

Also be sure to set EnableRemoteCommands=1 in zabbix_agentd.conf and restart zabbix.

When enabled I do see something in /tmp:

% ls /tmp/zabbix*
-rw-rw-r-- 1 zabbix zabbix 144 2009-08-20 11:04 /tmp/zabbix_test
% cat /tmp/zabbix*
SSH server is down on elektron: ON
Sshd is not running on elektron: ON

So this is starting to work nicely, however there a a few issues with it. The script:

  • runs only on the host specified (here: elektron);
  • runs under the user zabbix;
  • needs to parse its arguments.

host groups

Reading from the manual you can use the syntax:

hostgroup#command

instead of

host:command

So (in my case) using atoom# should fix creating actions for all my hosts.

Running privileged command

At page 162 it say:

One may be interested in using sudo to give access to privileged commands.

So it must be done with sudo.

Parsing the argument

Are there any other macros (page 87 in the manual) which can be of use? Looking at some:

{TRIGGER.ID}    Numeric trigger ID which triggered this action.
{TRIGGER.KEY}   Key of first item of the trigger which caused a notification.

I’ve added these to my little test script, let’s see what comes out of it.

SSH server is down on elektron: ON 13009 net.tcp.service[ssh]
Sshd is not running on elektron: ON 13014 proc.num[sshd]

Indeed a number is added (13009) and a net.tcp.service[ssh] string. That is somewhat more easy to parse, but still…

From the looks of it, the {TRIGGER.ID}s differs per host, so you cannot use them to check the failure of (say) the SSH daemon for all hosts. The {TRIGGER.KEY} looks much more portable and parseable in that respect.

In case you are interested the trigger ids can be found by going to the trigger screen of zabbix and clicking on a trigger. In the URL it has a triggerid=xxxxx string.

I finally went with the following macros:

atoom#/home/miekg/bin/zabbix_service "{TRIGGER.KEY}:{STATUS}:{HOSTNAME}"

Which gives the following output:

proc.num[sshd]:OFF:elektron

…and we have a string I can parse! :)

Todo

I’ve left to following items on my todo list

  • Configure sudo to give zabbix more powers - so that it is allowed to restart services;
  • Write a proper script, which can restart a service;
  • Flap detection;
  • Failure detection, stop restarting after n tries.