Risk analysis doesn’t get the attention it deserves. It’s the part of the risk assessment process where you feed all your important results — where the credit cards numbers are stored, the access rights for the folder, the results of phishing tests, the threat environment — into a* risk mode*l that produces a measurement. It’s this risk measurement, say VaR or average loss over a period, which upper-level management is really focused on.

The underlying metrics *are* important: IT needs to know the basic security state of the system to take the right actions to reduce risk levels. But C-levels are more interested in knowing dollar values of potential security incidents.

CISOs may be thinking they don’t have enough information to conduct even a basic risk calculation. In this post, we’ll take the *opposite* view: you have more than enough information to perform a good back-of-the-envelope risk calculation!

## Good Enough — 90% Confidence — Risk Data from IT

Have I mentioned recently that you can learn a lot from Hubbard and Seiersan’s book and website, How To Measure Anything In Cybersecurity Risk (and also from the FAIR gang)? These two authors explain how to tap into the hidden expert wisdom in the IT department. Guided by some in-house stats, the IT staff can generate basic inputs into a risk model: the rate or probability of a threat, and the cost range of that threat.

They suggest you interview the experts in the IT department and *calibrate* their answers. I should add that the book assumes that the IT admins and managers don’t have *detailed* *security analytics* at their fingertips. As we know, there is software, ahem, that can provide this information. In a future post, I’ll show how to use Varonis DatAlert to get even better answers for risk analytics purposes.

Here’s what Hubbard proposes. You *challenge* the IT staff to answer a few questions about the types of threats and their costs. Let’s say we’re interested in developing a risk model covering denial of Service or DoS attacks for an e-retailer. In this scenario, a risk analyst (or someone who’s mastered the *How To Measure Anything in Cybersecurity *book) engages the IT admins in a dialogue, similar to the following:

Q: How many incidents were there in the last 2-3 years where we’ve lost customer service due to a cyber attack?

A: *Let me think. There’s been, maybe, 10 incidents out of all the potential attacks where we’ve had real service outages. From an hour all the way to 2 days!*

Q: Can you break it down?

A: S*ome of them have been network-based denial of service, there have been a few ransomware, and then some I can’t classify …*

Q: Can you put a range from hours to days for each of the categories?

*A: Let me think. I’m pretty sure the worst ransomware attacks took us down for a full day or day and a half…*

You continue this way forcing the admin to drill down into each threat sub-category until she can come up with lower and upper bounds, *with 90% confidence*, for each type of outage. For an online merchant, time is money, so the outage information can be converted to lost sales in dollars.

Event | Percentage | Lower Bound of Costs | Upper Bound of Costs |

Network | 0.15 | 30000 | 100000 |

Ransomware | 0.05 | 500000 | 1000000 |

Other | 0.25 | 10000 | 50000 |

Why does this method result in a good guesstimate of cost ranges?

According to Hubbard, you can train experts to produce reliable estimates covering 90% of the data by forcing them to bet on their guesses! If the experts are convinced that outage range occur nine out of ten times, would they be willing to make a bet of $1000 on some game of chance with the equivalent odds? Losing money, even in an imaginary game, tends to concentrate the mind!

Hubbard show that this calibration test (and others as well) are effective in developing good estimates. Shameless plug: the book is worth it just for the insights into improving human judgments

Let me emphasize again: if you have internal *security analytics *data, then you can make better estimates than this “wisdom of the experts” method. However, as we’ll see below, just knowing the lower and upper cost limits where most of the data falls — excluding the tail — is powerful.

Those who’ve read my previous series of posts on risk analysis may have noticed I’m taking a different, simpler approach this time. The older series focused on the *tail* of the loss curves, and that’s just mathier —no easy way to get around it. The other difference is that in the more complex model I was relying on external data, HIPAA breach data in my case, which is great … if you’re company is part of the healthcare sector.

Hubbard’s great idea is that you can instead leverage internal resources —data sets, and IT experts — to work out an exceedance loss for the *middle* part of the curve. The math is more straightforward and is based on a *lognormal* distributions, a heavy-tailed curve that’s easier to deal with than the power laws that I’ve been relying on. Don’t panic, I’ll explain.

## Normalizing Abnormal Lognormal Data

We’re all familiar (from Math for Poets 101) with the bell-shaped or normal curve. As I’ve been pointing out, many datasets, including internet-related stats, breach data, file system size, can *not* be modeled by the friendlier normal curve but instead by a heavier-tailed or power law curves. Well, lognormal is one of those heavy-tailed curves. You can think of it as the bell-shaped curve’s grumpier and less friendly brother.

What exactly is it?

If you’re frightened by high-school algebra, then skip ahead. A data set that is distributed as lognormal implies that when you take natural log (or *ln*) of the data points, the resulting distributing *is a normal curve*. In a way, it’s one-step removed from normal, with just enough weirdness in it to put it into the heavy-tailed territory.

As it turns out the aforementioned data sets, particularly breach stats, can be modeled very nicely by the lognormal. Curve fitting software, like what I used here, often can’t tell the difference between lognormal and power law data. So the distribution you choose, lognormal or power, often just depends on what’s more convenient.

For our situation, it’s easier to work with lognormal. Hubbard shows that if we have just the lower and upper limits (which covers 90% of the data), we can tailor to lognormal *without* using fancy curve-fitting software.

We just need two parameters, the mean and the standard deviation. Here are two tricks that your beloved professors may not have taught you.

You obtain a good estimate of the mean of a sample from a normal curve by adding together just the smallest and largest numbers in the set and dividing by two. Easy peasy.

To estimate the standard deviation, assuming that you have 90% of the data, subtract the smallest number from the largest and divide by 3.29. This very cool trick is known as the range rule for standard deviation. (Mathies will recognize 3.29 as the z-score for a two-sided interval.)

To fit our lognormal curve, we take the *ln* of the upper and lower points and then work out the parameters using the two tricks above. Curve fitting to lognormal accomplished with minimal work!

## Home Grown Loss Exceedance Curves

Confused?

I’ve put together another wondrous spreadsheet that lets you experiment with Hubbard’s approach to risk analysis. There’s a worksheet to show that the simple lognormal curve fitting tricks above work out nicely (if the upper limit is not too large).

I’ve also included a simple Monte Carlo simulation that generates 1000 trials for denial of service threats based on three different subcategories, each with different probabilities of occurrence.

You can gaze upon the loss curve I generated from one of my simulations.

Experiment with the spreadsheet, and use it to conduct your own cyber threat risk analysis.

I’ll take up these ideas again in another post, and delve more deeply into security analytics data. In the meantime, keep on Monte Carlo-ing!