“Monitor everything” approach: did you miss anything?

Common pitfalls in setting up network monitoring

Monitoring stand

In many cases, when a significant new service, device, or application is brought online, it is not added to monitoring immediately. Resources often begin to be monitored only after they “suddenly” stop working — sometimes in the least convenient way possible, such as learning that your online shop is unavailable from customer messages.

Whereas there can be management or communication issues, the result is the same: an important resource can become unavailable, and no proper notice of that is provided in time. The following is a quick checklist of how monitoring setup should be maintained, to avoid (or at least to reduce) necessity to fix the problem on “round the clock” schedule.

The below list isn’t complete; there can be situations not covered by it. If you think you can add to the list below, please let us know!

Scan your network periodically

Apart from initial discovering network devices, IPNetwork has “rediscovery” feature: in its “Settings > Rediscovery” section one can set up periodic attempts to find new devices:

Rediscovery menu

The settings in the rediscovery section are similar to those in the network discovery wizard. Enable rediscovery to stay aware of changes in the network around the system running IPNetwork Monitor. In larger environments, this is especially useful together with application templates and Remote Network Agents, so newly found systems can be reviewed and covered more systematically. You might still prefer to create only basic PING monitors during automatic discovery, and then add the remaining monitors or templates after review.

Add monitor for every resource

Every possible system’s resource can be exhausted – in case you add a new device (say, a computer), make sure you put every resource under monitoring:
In practice, the quickest way to avoid gaps is often to start from an application template and then review which monitors require manual configuration or additional tuning.

every physical resource: CPU, RAM, available disk space on every file system, network speed for every adapter installed, available file handles and so on
for every important process running, set up monitor to notify of its absence (or excess)
in case the system listens on publicly available network ports – monitor the presence of open ports
if Web (or other) applications are available from the device, create at least basic HTTP(S) monitors and, where appropriate, validate returned content instead of checking reachability alone
depending on operating system type, set up either Event log or Syslog monitor, to be aware of most important system-wide events

The above is actually a shortlist. For every system added to the network watched, all the important services and the resources the former depends upon should be monitored.

It can help if you document every device being added to the network you monitor. The quicker all the details are saved and corresponding monitors added, the less feasible become situation when an important resource goes off radars silently.

Test everything

Human errors, even small ones, such as typos, donate to the entire probability of something important going down without proper notice.

When changing the settings, make sure you check them in action. For example, after setting up or changing settings for email notifications (“Settings > Email Settings”), click the “Send test email alert” and make sure email is actually delivered.

Same goes for database backups (you can both create a database backup at any time, and restore any of created backups – to make sure they are valid). Note that in case backup restoration fails, the previous state of database is kept as nms.fdb.old file, thus can be safely recovered.

When constructing alerts — perhaps the most important part of a monitoring setup — make sure you use the “Testing” tab in the “Alerting” section to verify how alerts will behave. The Testing tab shows which simple actions will run for each state transition, allows you to execute them immediately regardless of schedule, and displays log output that helps diagnose delivery or configuration problems. This is especially useful for alerts that would otherwise be triggered rarely.

Whatever change you apply to monitoring setup, make sure you test it as soon as possible. This is also important when adding monitors from templates, because some of them may require manual parameters before they can work correctly.

Enable more regular reports

By default, IPNetwork sends daily reports on monitoring setup. We recommend getting acquainted with the report messages, to notice quickly all the important changes and trends.

At any moment, one can change the reporting parameters by going to “Settings > Reporting”:

Reporting menu

Please note you can include reports of newly discovered devices, and provide auxiliary (Cc:) email address to send reports to.

By default, reports are sent to $AdminMail address, defined in “Settings > Email Settings”. Make sure that address does actually exist and accepts incoming mail.

Use scheduled alerts

By default, every simple action in alerts uses default “Always” schedule. However, one can define arbitrary amount of other schedules, in “Settings > Schedules”:

Schedules menu

After that, if you add several simple actions to the same alert, each with its own schedule, you can ensure that some notifications are sent only during business hours while others are used after hours. For example, you can send e-mail during the day, escalate to a different recipient list in the evening, or post an HTTP(S) request to an external service such as Slack or Microsoft Teams when an incident needs wider visibility. Push notifications can also be used for urgent after-hours alerts.

Conclusion

The above list includes most typical pitfalls you can get in while setting up monitoring. If you know another typical example and would like to share your experience with others, feel free to let us know!