Alert Fatigue – “Paralysis by Analysis”

POSTED BY RYAN TROST

I was recently chatting with a friend who runs a SOC in the UK and our conversation turned to one of his constant challenges – “alert fatigue.”

For those not familiar with the term, alert fatigue is when an analyst is overwhelmed by alerts and becomes de-sensitized, reaching a point where they simply dismiss them with little to no effort put into the investigation during alert triage.

Unfortunately for most security analysts, alert fatigue is widespread but part of a sick and twisted industry “rite of passage” as analysts coming up the ranks earn real world experience honing their senses.

There are a few different types of alert fatigue including:

1) The analyst is overwhelmed by alerts with no end in sight!
2) The same alert is produced over and over again and the analyst blindly closes the alert without proper investigation
3) During the investigation the analyst’s query against the log repository takes so long to return the results that the analyst reaches a boiling point and just dismisses the alert altogether

Each one has its own set of plausible solutions.

In today’s ramble, I’m going to focus on #1 as it is the most common and onerous. To put it into perspective, in one of my previous shops we were running ArcSight ESM averaging 150B events per day which, after correlation, boiled down to 1.2M alerts per week. This forced my team of 12 analysts across 3 shifts to EACH triage 100,000 alerts per week! And those metrics don’t even include any “budget-friendly logs” like endpoint or DNS logs. You can immediately understand why an analyst looking at alerts all day everyday for months can easily slip into alert fatigue – even subconsciously!

There are a few ways to mitigate alert fatigue including: 8-month role rotations across job functions, an 80/20 approach to allow analysts to surface for air when they hit the wall, and tighter prioritization during alert triage.

Job rotation is something that’s always talked about but rarely executed well.

Typically, alert triage, is done by junior analysts. Because they make up a majority of the teams, pivoting a fraction (10-15%) of your analyst staff won’t cause civil unrest. Whereas rotating senior folks usually creates a bigger operational impact and responsibility bleed. There’s also an added benefit of job rotation. It exposes junior analysts to the different workflows and challenges faced in a range of jobs –security analyst, malware engineer, intelligence analyst, signature engineer, or even vulnerability assessment engineer. This gives them the opportunity to understand and gravitate toward their strengths and interests. This is similar to going into freshman year at college.
Most students haven’t the slightest idea what they want to focus on so they are exposed to a wide range of courses…even underwater basket weaving.

Another approach to minimize alert fatigue caused by analyst burnout is by implementing an “80/20” rule where 80% of the time the analyst focuses on tickets and investigations and in the remaining 20% the analyst can do something they are interested in (malware analysis, intelligence collection, hunting, etc.) This provides a little bit of a buffer where analysts can replenish motivation and challenge themselves while staying within the cyber arena. At a time when security talent is hard to find and difficult to retain, this can also help keep analysts engaged and happy to be part of your team. The additional expertise they gain during their “down time” can also help strengthen the company’s security posture.

The previous solutions focus on the human aspect of alert fatigue but the third suggestion centers around the SIEM technology as the primary source of endless alerts.

SIEM dashboards are overloaded with visual stimulations that infringe on nearly every theory on data graphics – including blinking lights, data overload, and useless pew-pew maps.

A common practice during alert triage, but one worth noting as it’s been implemented in almost every SOC I’ve worked for or managed, is filtering alerts within the main SIEM dashboard beyond the out-of-the-box FIFO prioritization. So instead of having 1.2M alerts in a single dashboard with no end in sight, alerts are categorized into isolated dashboard containers. In ArcSight ESM, analysts live and breathe in the alert dashboard called an Active Channel but use multiple Active Channels to help them prioritize efforts.

Active Channels are organized by alert score or confidence where alerts with a threat score of 70-100 require instant visibility, 40-69 when time allows, and 1-39 is noise that the team will never have time to look at.

This is where the intersection of threat intelligence is really providing some traction!

As many shops are augmenting their alert data by funneling the threat intelligence from their threat intelligence platform into the SIEM, now teams can keep their high fidelity Active Channel described earlier but also maintain purpose-built filters based on customer-specific needs. For instance, an Active Channel dedicated to command-and-control (C2) indicators and/or an Active Channel dedicated to adversary TTPs. This offers smaller alert counts so analysts don’t burn out, as well as strengthens a company’s posture by allowing them to react to alerts based on multiple dimensions – alert prioritization, possible C2, or adversary alerts.

Alert fatigue is a danger for every operational team. The goal is to keep it at bay for as long as possible during alert triage. A threat intelligence platform like ThreatQ offers a way to use threat intelligence to curb the pandemic. But regardless of which solution works for your team, keep moving forward!

← Previous Post Next Post →

0 Comments

Blog Categories

Blog Archive

Quick Links

About ThreatQuotient™

ThreatQuotient™ understands that the foundation of intelligence-driven security is people. The company’s open and extensible threat intelligence platform, ThreatQ™, empowers security teams with the context, customization and prioritization needed to make better decisions, accelerate detection and response and advance team collaboration.
LEARN MORE