Thread Rating:
  • 0 Vote(s) - 0 Average
  • 1
  • 2
  • 3
  • 4
  • 5
Unreachability and notifications to many
#9
Quote:[cite]Posted By: alchemyx[/cite][p]Idea with multiple recipients in one actions sounds fine. If it doesn't work I can't always make it work with some kind of wrapper to mailx. So let say it is solved.[/p]

We're working on this now, you're not the first asking for that and you convinced me.

Quote:[p]About unreachability I am using SVN version of Pandora and can't find cascade protection - where is it?

Check it out in a few hours, Ramon is finising server code. Beware because you need to update your DB (Check latest lines in extras/pandoradb_migrate_v2.x_to_v3.0.sql (August 2009).

Quote:BTW: How to make this kind of correlated alarm: Fire only alarm when host A is up and host B is down? I really can't think of logical operator that could make it work (0 - down/alarm, 1 up/no alarm)[/p][p]
Code:
A / B / alarm
0 / 0 / 1
0 / 1 / 1
1 / 0 / 0
1 / 1 / 1
[/p][p]About parents in other software I can speak only for nagios. There it is simply achieved:
- I set up for every host one or more parents (parent is the host being immediate parent)
- I set up for my contact that I want to receive notifications about - up, down, but not unreachibility[/p][p]And that is all Smile[/p]

In your example, will suppose that your host A is a router and your host B is a server.

If you setup in B that parent's agent is A and set "Cascade protection". Alerts of B don't be fired if A is down (any of them). You of course need to define an alert in A to define when A is down for you.

You may have a lots of servers like B, and you need only to define the parent for them. So you have one alert for each host (B type) and one alert for A router. In a 1000 agent setup you have 1001 alerts, most of them probably the same and very easy to assign using the alert template system and the massive configuration tool (easiest even in the enterprise version with the policy management).

Second option is more fun, its to use correlation. But with a simple example of two host are not enough to understand when this feature is most useful, for example, think you have a router, a firewall, a server with a webserver and a database. You want to know when your service is operating in bad shape (having problems) and when your service is definitively out, and know why.

You will have the following monitors:

- ROUTER: a ICMP check and a SNMP check using a Standard OID to get the ATM port status. Also may have a Latency check for your parent/provider router.
- WEB SERVER: you have several internal checks running with the Pandora FMS agent: CPU usage, MEM usage and process check of your Apache. You have also a latency check for a 4-step navigation HTTP check.
- DATABASE SERVER: you have several internal checks running with the Pandora FMS agent: CPU usage, MEM usage and process check of your Database. Also a few database integrity checks. You also check remote connectivity to database using a plugin-defined test to login, make a query and exit, timing the answer.

Now you define several SINGLE alerts:

-ROUTER:
ICMP Check / CRITICAL -> Action, send MAIL.
SNMP Check / CRITICAL -> Action, send MAIL.
Latency > 200ms / WARNING -> Action, none, just compound.

-WEB SERVER
CPU / WARNING -> Action, none, just compound.
MEM / WARNING -> Action, none, just compound.
PROCESS / CRITICAL -> Action, send MAIL.
HTTP LATENCY / WARNING -> Action, none, just compound.

-DATABASE SERVER
CPU / WARNING -> Action, none, just compound.
MEM / WARNING -> Action, none, just compound.
PROCESS / CRITICAL -> Action, send MAIL.
SQL LATENCY / WARNING > Action, send MAIL.

You define ROUTER as parent for DATABASE and WEB servers. You enable the Cascade Protection in both agents (Database and Web).

You now define one correlation alert assigned to DATABASE:

Router ICMP Check NOT Fired
AND
Router SNMP Check NOT Fired
AND
WEB Server Process NOT Fired
AND
Database Server Process Critical
THEN
Send MAIL: "Service DOWN: Database Failure"


You now define one correlation alert assigned to DATABASE:

Router ICMP Check NOT Fired
AND
Router SNMP Check NOT Fired
AND
WEB Server Process Fired
AND
Database Server Process NOT Fired
THEN
Send MAIL: "Service DOWN: WebServer Failure"

And more complex alerts like:

Router ICMP Check NOT Fired
AND
Router SNMP Check NOT Fired
AND
WEB Server HTTP Latency NOT Fired
AND
DATABASE Server SQL Latency Fired
AND
DATABASE Server CPU NOT fired
AND
DATABASE Server MEM Fired
THEN
Send MAIL: Database is getting exausted. Please check it ASAP.
 Reply
Messages In This Thread
Unreachability and notifications to many - by slerena - 09-01-2009, 11:18 PM


Users browsing this thread: 1 Guest(s)


(c) 2006-2018 Artica Soluciones Tecnológicas. Contents of this wiki are under Create Common Attribution v3 licence. | pandorafms.com | pandorafms.org

Theme © MyBB Themes