Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Escalation in alerts
#1
Hi all,
I'm almost done evaluating Pandora vs Zabbix. So far Pandora is a clear winner, as it has a much better user interface than Zabbix. However, Zabbix has one feature that I can't seem to find in Pandora. I need an escalation scheme so I can go through our on call list. Is there a way to create escalation lists?

Thanks,
Todd
 Reply
#2
Can you explain what you exactly understand as "Escalation"?, if it's possible with some samples please.

Thanks !
 Reply
#3
No worries. Here's an example.

We have 3 groups. On-Call, Admins, Engineers.
We have a service (on windows), we'll call it Data Listener.
Our agent is configured to monitor the process DataListener.exe.

We want to configure an alert so that if DataListener isn't running, an alert is sent out. Here is how it would flow.

Initial Alert

1. Service stops
2. Email (which is forwarded to SMS) is sent to goup On-Call
3. The alert is not acked in 10 minutes, the same message is sent to Admins
4. The alert is not acked in 10 minutes, the same message is sent to Engineers.

If at any point the message is acked, the alerts stop.

Recovery

Notifies all groups that received an alert email during the initial alert.

On a completely separate note, my company is sponsoring a project to implement active checks against web resources. We're looking at attempting to use something like the Selenium IDE to record complex scripts, then use a headless Firefox to run these scripts and integrate the results with Pandora as a plug in. I know version 3 has a basic web checker, but it's not sufficient for out needs. An actual browser execution will allow companies like mine who use AJAX controls to property check their site is functioning. I've joined the dev list, but I'm still waiting on authorization. Would it be possible to approve me so I can get some input from the developer list before I publish the specs for the competition?
 Reply
#4
Hi there

I assume you are using 3.0.

In that case this is kinda simple, let me see if I explain myself clearly here:

You just need to define one template with one default action (SMS)

Then define 2 actions (email to engineers and email to admins)

In that template set:

Min number of alerts: 0
Max: 1

Threshold: 10 minutes

Then assign the default template to the modules (remember this default template includes the SMS delivery) then mark the two actions you created: email to engineers and email to admins.

Voilá, the expected result is gotten.

1 sms and 1 email to everybody
After 10 minutes if the alert is still fired, another sms and email to engineers and admins
If you ack the alert, then you're done. (you can add a recovery email or sms if you want too to notify the alert is recovered)

Cheers
Manuel.
 Reply
#5
Currently I'm on version 2 as I need a stable release for production monitoring. Do you guys have a rough ETA on when 3 will be released?

Also, what is your process for creating a release? I was browsing the subversion repository, and I noticed that there there are no tags for the release. This is something we need so we can be sure we're getting the same source on all our servers. Once we validate a version as stable, we need to keep using that version as our systems require 100% up time. Do you guys plan to start tagging in the next release?
 Reply
#6
Quote:[cite]Posted By: manu[/cite][p]Hi there[/p][p]I assume you are using 3.0.[/p][p]In that case this is kinda simple, let me see if I explain myself clearly here:[/p][p]You just need to define one template with one default action (SMS)[/p][p]Then define 2 actions (email to engineers and email to admins)[/p][p]In that template set:[/p][p]Min number of alerts: 0
Max: 1[/p][p]Threshold: 10 minutes[/p][p]Then assign the default template to the modules (remember this default template includes the SMS delivery) then mark the two actions you created: email to engineers and email to admins.[/p][p]Voilá, the expected result is gotten.[/p][p]1 sms and 1 email to everybody
After 10 minutes if the alert is still fired, another sms and email to engineers and admins
If you ack the alert, then you're done. (you can add a recovery email or sms if you want too to notify the alert is recovered)[/p][p]Cheers
Manuel.[/p]

Manuel,

Your reply doesn't address escalation at all; you simply described how to send the notification to everyone all of the time. tnine only wants the notification to go to the On-Call group on the initial alert. If and only if a second alert triggers 10 minutes later, then send it to both the On-Call group and the Admins group. Finally, if and only if a third alert triggers 10 minutes later (20 minutes after the original alert triggered) send it to all three groups (On-Call, Admins and Engineers). This is a very common escalation process in most established server/network/application management & monitoring applications.
 Reply
#7
Quote:[cite]Posted By: tnine[/cite][p]No worries. Here's an example.[/p][p]We have 3 groups. On-Call, Admins, Engineers.
We have a service (on windows), we'll call it Data Listener.
Our agent is configured to monitor the process DataListener.exe.[/p][p]We want to configure an alert so that if DataListener isn't running, an alert is sent out. Here is how it would flow.[/p][p]Initial Alert[/p][p]1. Service stops
2. Email (which is forwarded to SMS) is sent to goup On-Call
3. The alert is not acked in 10 minutes, the same message is sent to Admins
4. The alert is not acked in 10 minutes, the same message is sent to Engineers.[/p][p]If at any point the message is acked, the alerts stop.[/p][p]Recovery[/p][p]Notifies all groups that received an alert email during the initial alert.[/p]

On a completely separate note, my company is sponsoring a project to implement active checks against web resources. We're looking at attempting to use something like the Selenium IDE to record complex scripts, then use a headless Firefox to run these scripts and integrate the results with Pandora as a plug in. I know version 3 has a basic web checker, but it's not sufficient for out needs. An actual browser execution will allow companies like mine who use AJAX controls to property check their site is functioning. I've joined the dev list, but I'm still waiting on authorization. Would it be possible to approve me so I can get some input from the developer list before I publish the specs for the competition?

Hi tnine.

You've granted to the dev. list right now, I have the aproval mail in the SPAM folder Sad

About the escalation, I think Manu try to explain, but a few screenshots will be more easy to understand.

In this capture, I've defined an alert who detects if host is down. This will raise a syslog event the first two times it happens, and if happen more than 2 times (until 4) it will email me. I think is exactly feature you're looking for. You can let a notification to run togheter other notification or make it not to overlap. Number of "actions" triggered by an alert is infinite, you can have 1 or 200.

Ack feature is ready since 2.1 version and it's complementary to this.

This is for 3.0 version and next versions only, this is a big change in Pandora alerting system.

About the idea of making a powerful WEB navitagor checker and to integrate in pandora.. GO AHEAD, we will help you in any you need !, please explain yourself in the developer list and let's try to work together...
 Reply


Users browsing this thread: 2 Guest(s)


(c) 2006-2018 Artica Soluciones Tecnológicas. Contents of this wiki are under Create Common Attribution v3 licence. | pandorafms.com | pandorafms.org

Theme © MyBB Themes