“Most of us spend too much time on what is urgent and not enough time on what is important.” ― Stephen R. Covey

There’s a difference between an alert that feels urgent and one that is truly important,  and Jessica’s team can’t tell the difference

When the alert sounded, a quick glance at her screen told Jessica that database storage was critically low. Yet after a few minutes of double checking various settings and parameters she found that over 20% free space remained.  As she turned to face me, Jessica rolled her eyes and said, “I get over 200 alerts like that every day, and 98% of them are just noise. All my co-workers say the same thing, ‘We spend too much time chasing false alarms.’ If we could isolate the 4 alarms that really matter each day it would save me, and my company, a lot of time and a bundle of money.

For IT and managed service providers (MSP), false positives (false alarms) are the bane of system administration as they divert precious resources to fix phantom problems. False positives are like a recurring headache, you know something is wrong but you don’t know if it’s serious and you can’t figure it out without more data. The same is true for information systems, where every alarm should be investigated to determine if it’s a serious problem. Unfortunately, the amount of time wasted chasing phantom problems causes even the best system administrators to ignore certain alarms, and this will continue until they get burned by a loss of service (unscheduled downtime).

Receiving a system alert always feels urgent but there’s a difference between an alert that feels urgent and one that is truly important, and Jessica’s team can’t tell the difference. She may be curious about what caused the storage alert, but what Jessica really needs to know is whether the alert is important. Her answer is directly related to the nature of the storage system. For the database, an apparent sudden decrease in storage capacity is cause for concern because it could affect business services like billing and accounts receivable. But what if the storage alert isn’t going to affect the business at all and is simply the temporary result of routine load balancing processes? At the time of the alert, the database thinks it’s running out of space and isn’t aware that the larger storage system has been set up to prevent critical problems. As Jessica discovered, there is no urgency because it was a temporary condition, and other systems are in place to ensure stability.

Trying to implement custom logic regarding performance and service levels into each layer of the IT system is a fool’s game

Jessica’s dilemma, of too many false positives, is something that most system administrators face. Trying to implement custom logic regarding performance and service levels into each layer of the IT system is a fool’s game. For one thing, the logic can be incredibly complex and furthermore it changes with each new device or software upgrade (both applications and operating systems). Furthermore, it doesn’t make sense to filter out low-level alerts at each layer because they will be needed for fault isolation and root cause analysis. Finally, properly tuning custom alerts can be time-consuming, causing administrators to lose patience so they either take alerts at face value (with all the false positives) or they ignore everything and suffer the consequences. Instead, companies should consider using cross-domain correlation, which evaluates, compares and weighs critical metrics from each layer of the system (hardware and software) against required service levels. The result is a top-level dashboard on a single pane of glass that gives unprecedented insights, revealing the overall health of business information systems.

Single pane of glass executive dashboard
Comparing and weighing critical metrics from each layer of the system

With cross-domain correlation, Jessica could avoid all those alerts that seem urgent and focus instead on what’s truly important, delivering guaranteed service levels. Centerity’s solution is uniquely designed as a unified software platform that detects and extracts critical metrics from each layer of the information system and integrates with all existing devices, applications and monitoring tools to provide cross-domain correlation to system administrators, engineers and customer support reps (CSR), ensuring higher uptime, lower costs, better reliability and higher CSAT scores.