To paraphrase a quote from Cecil DeMille, the way to make a blog post is to begin with an earthquake and work up to a climax. Let’s start with a video of me stopping a process on one of Ducksboard‘s production servers and promptly getting an angry call on my cell from a robot. The rest of this post will work its way through steps necessary to set this up and will end with the true engineer’s climax: a link to the source code repository.
Click here to view the embedded video.
Monitor or die
If you’re taking people’s money, you need to have monitoring. If you rely on things to “just work” and you’re lucky, you’ll soon have angry customers gathering at your gates with pitchforks and torches. If you’re unlucky, you just won’t have customers anymore.
Over here we’re are all good little ducklings, so we have a monitoring system set up. After initial evaluation, we went with Zabbix for several reasons: it provides both alerting and graphing, it can monitor everything from our database size to web frontend response times and it’s flexible enough to make it call my mobile phone when something goes astray. Oh, did I mention it’s free and open source as well?
Interlude: a little Zabbix primer
Zabbix operates using the concepts of hosts, items and triggers. A host is a machine you are monitoring. Each host can have several items defined, where item is a single element of your infrastructure being monitored. What makes Zabbix different from other monitoring systems is that you don’t define a checks like “is ntpd running on frontend1″. You would add an item called “number of running ntpd processes” on the “frontend1″ host and define a trigger when that item’s value falls below one.
Since all items provide values, you get historical graphs for free: once you define your ntpd item, you can easily see when was the last time you had issues with ntpd by looking at the item’s graph and noticing places where it dropped to zero.
As an aside, items and triggers can be bundled into templates to make managing the configuration easier. Once you have a “frontend” template filled with all the items you want to monitor, deploying another frontend machine is as easy as defining a new host and applying the “frontend” template to it.
Once you have triggers it’s time to add actions. An action can be sending an email, texting through a GSM modem, sending a Jabber message or… calling an arbitrary script. Couple that with a few bucks in a Twilio account and you can start connecting the dots.
How to cheer a sysadmin’s night (with a 3 AM call)
Here’s our battle plan:
- An alarm is triggered
- A custom script is executed by the Zabbix server
- The script gets the sysadmin’s number and initiates a call using Twilio
- Use a simple application that just instructs Twilio to read the alert message and hang up
- Cackle maniacally
First, we need to create a new medium – mediums are ways Zabbix uses to communicate with users. An typical medium is email, but in our case it’s going to be a custom script.
Important: the action script needs to be uploaded to the machine where Zabbix is running and placed in /etc/zabbix/alert.d
. You can’t use absolute paths, everything is treated as relative to the alert.d
directory.
Then, make sure a trigger is defined. We will trigger an action if the number of running ntpd processes falls below one.
Finally, configure the action. The medium script will get the subject and message as parameters; we use the trigger’s name as the subject and it will double as the content of the call. We set it up so that our previously defined medium is used and only one call will be made.
Don’t call us, we’ll call you
The action script is very straightforward, it just schedules a Twilio call to whatever number is configured and passes it an application URL that includes the message to be read out. Put your Twilio credentials and number in /etc/zabbix/alert.d/twilio-alert-handler.ini
.
[gist id=3c2d902d4421a677e755]
The missing part is a simple application that will drive the call. For that, we’ll run a simple Twisted daemon on a high port that just serves up appropriate TwiML.
[gist id=92a138a58d0d3f1e4929]
Put it in /etc/zabbix
and run like that:
twistd --logfile=/var/log/zabbix-server/twilio_alert.log --pidfile=/var/run/zabbix-server/twilio_alert.pid web --port=tcp:8080 --resource-script=/etc/zabbix/twilio-alert.rpy
In production environments you would probably want to somehow limit access to this daemon, either by putting it behind a remote proxy with Basic authentication or using a non-guessable URL instead of just /
.
And that’s it, you now have voice-based alerts that will wake you if something goes wrong. Enjoy your newly acquired peace of mind and here’s the promised repository with all the code: https://github.com/ducksboard/twilio-zabbix-handler
By the way…
We’re sponsoring the 2012 TwilioCon hackathon. If you like stitching things together for fun and profit, you should definitely check it out and win awesome prizes, glory and fame maybe? We’re even giving away Android Mini PCs MK802 for everyone on the winning team. Be sure to come by and if we’ll not be handling critical service errors, we’ll certainly like to have a chat!
The post When duty calls, literally: making Twilio calls from Zabbix appeared first on Blog.