As organizations become more data driven, they also store more data in more places and access it in more ways — with phones, tablets and laptops. These ever-connected endpoints serve as gateways to large, centralized troves of sensitive information stored in your data center and in the cloud.
This digital transformation has flipped traditional security — which focused on perimeters and endpoints — on its head. With cloud and work-from-anywhere infrastructure, a perimeter is hard to define and harder to watch. Endpoints are fungible. Very little data “lives” only on your phone or laptop these days.
This shift has led some organizations to start thinking more about data protection and what data-centric security looks like. I’d like to share what I’ve learned about data protection as CEO of a data security company that started because of how poorly most data is protected.
The Data Protection Paradox
Data protection is intuitively simple but immensely complex.
Why is data protection intuitively simple?
I’d argue that if you can answer “yes” to these three questions, and you can answer “yes” continually, then your data is safe:
- Do you know where your important data is stored?
- Do you know that only the right people have access to it?
- Do you know that they’re using data correctly?
It may surprise you that most organizations can’t answer yes to any of these questions for any of their data, much less all of it. These questions frame the three dimensions of data protection — importance, accessibility and usage — and why the whole isn’t just greater than the sum of the parts. The parts are ineffective without the whole.
If you only know what data is important, you won’t know where it’s exposed without understanding who has access to it. You won’t know who needs access and how to safely fix any exposures without monitoring usage. Without usage, you’ll also never be able to see if important data is being stolen or encrypted in a ransomware attack.
If you start with usage, you may be able to see what data was stolen after a breach, or even alert on unusual access patterns (assuming you can measure and track against a baseline). But you won’t know whether the data was important or who else can access and steal it today or tomorrow.
No matter which dimension you start with, you’ll quickly find you need the other two.
In a common one-dimensional approach, some organizations try to identify important data by asking employees to tag files manually or by using automation to identify important regulated or sensitive data — or a combination of both. Sometimes I ask, “Let’s say you had a list of all the important, sensitive files and records you have — what would you do with it?”
Most organizations are quickly surprised at the number of important files and records they find, and there’s no clear plan of action without the other two dimensions — accessibility and usage. Going one record at a time is just as infeasible as making sweeping decisions about all of them at once.
To make meaningful decisions or improve risk posture, you must see where important data is concentrated and exposed (at risk) and who is using it or not (stale). That’s why data protection doesn’t end with classification; it can only start there.
Why is answering these questions immensely complex?
Now that we see what we need to be able to answer all three questions, I’ll discuss why each one is difficult to answer, especially across data stores and applications.
Identifying your important data might seem straightforward — if you know how to find addresses and phone numbers in Salesforce, for example, you can probably identify the same things in spreadsheets in Microsoft 365 or Google Drive. Just because you’re looking for the same things, however, doesn’t mean it’s easy — it takes a lot of sophistication and development to make classification accurate, assuming you can get to the data.
To get to the data, you’ll need automation to connect to the right places, “read” and accurately analyze what’s likely millions of records and files and then keep reading the new and updated ones each day, preferably without killing performance or running up your cloud computing bill.
When it comes to analyzing accessibility, most don’t realize how many folders, files and records they need to analyze. A single terabyte of data routinely holds tens of thousands of these objects with specific, unique permissions that determine which users and groups can access them, and organizations now store thousands of terabytes. All the relationships between users and groups need to be analyzed, too. To make matters worse — each application implements permissions mechanisms differently.
Understanding usage isn’t any simpler. Some applications and systems don’t even track data usage by default. Many that do are noisy or incomplete. They’re voluminous, they’re all different and none of them hold much, if any, context about how important the data is or who’s accessing it. Without understanding normal usage, spotting abnormal usage is a non-starter.
These complexities hit home when you’re in the heat of battle with ransomware or you’re worried about the damage an insider might do. If you can’t see instantly what a compromised user could have taken — or did take — across applications and files on-prem and in the cloud, you’re already dangerously behind.
As your organization hones its data protection practices, make sure you can answer all three questions wherever you store your data. With an understanding of importance, accessibility and usage, you’ll be able to transform your security to thrive in a digitally transformed world.
This article first appeared on Forbes.
We've been keeping the world's most valuable data out of enemy hands since 2005 with our market-leading data security platform.How it works
Co-Founder and CEO of Varonis, responsible for leading the management, strategic direction, and execution of the company.