Remote commands with Zabbix actions
With monit
services are restarted, ever since I’ve installed zabbix
I
wanted the same functionallity. Turns out this is possible, but it
takes some configuration.
Also see the zabbix manual, where it gets interesting from page 160 onwards.
In zabbix go to Configuration->Actions.
Add a new ‘Action Operation’ in which you want to run a remote command.
- Operation type: remote command
- Remote command: host:script in my (test) case
elektron:/home/miekg/bin/zabbix_service {TRIGGER.NAME}: {STATUS}
And zabbix_service
is now a shell script which will echo
its
arguments to a file in /tmp
.
Also be sure to set EnableRemoteCommands=1
in zabbix_agentd.conf
and
restart zabbix
.
When enabled I do see something in /tmp
:
% ls /tmp/zabbix*
-rw-rw-r-- 1 zabbix zabbix 144 2009-08-20 11:04 /tmp/zabbix_test
% cat /tmp/zabbix*
SSH server is down on elektron: ON
Sshd is not running on elektron: ON
So this is starting to work nicely, however there a a few issues with it. The script:
- runs only on the host specified (here:
elektron
); - runs under the user
zabbix
; - needs to parse its arguments.
host groups⌗
Reading from the manual you can use the syntax:
hostgroup#command
instead of
host:command
So (in my case) using atoom#
should fix creating actions for all my
hosts.
Running privileged command⌗
At page 162 it say:
One may be interested in using sudo to give access to privileged commands.
So it must be done with sudo
.
Parsing the argument⌗
Are there any other macros (page 87 in the manual) which can be of use? Looking at some:
{TRIGGER.ID} Numeric trigger ID which triggered this action.
{TRIGGER.KEY} Key of first item of the trigger which caused a notification.
I’ve added these to my little test script, let’s see what comes out of it.
SSH server is down on elektron: ON 13009 net.tcp.service[ssh]
Sshd is not running on elektron: ON 13014 proc.num[sshd]
Indeed a number is added (13009
) and a net.tcp.service[ssh]
string.
That is somewhat more easy to parse, but still…
From the looks of it, the {TRIGGER.ID}
s differs per host, so you
cannot use them to check the failure of (say) the SSH daemon for all
hosts. The {TRIGGER.KEY}
looks much more portable and parseable in that
respect.
In case you are interested the trigger ids can be found by going to the trigger screen of zabbix and clicking on a trigger. In the URL it has a
triggerid=xxxxx
string.
I finally went with the following macros:
atoom#/home/miekg/bin/zabbix_service "{TRIGGER.KEY}:{STATUS}:{HOSTNAME}"
Which gives the following output:
proc.num[sshd]:OFF:elektron
…and we have a string I can parse! :)
Todo⌗
I’ve left to following items on my todo list
- Configure
sudo
to give zabbix more powers - so that it is allowed to restart services; - Write a proper script, which can restart a service;
- Flap detection;
- Failure detection, stop restarting after
n
tries.