More often than not, we’re seeing that customers have much more GDPR eligible data than they thought they had – or even knew existed. A recent GDPR Readiness Assessment for a mid-sized insurance company revealed some eye-opening results. In the below example, we focused on a single data store with 12 TB of data in 20+ million files across 1.36 million folders.
On that single data store we found over 15,000 files with GDPR sensitive data. 90% of the files that held German data – ranging from DE passport numbers to Personalausweisnummer (German identity card number) – were open to the entire company…and the German data was in the best shape. France, Spain, and Sweden classification hits were 100% exposed!
How Can I Identify My GDPR Data?
It can be difficult to discover and classify what data falls under the GDPR – so difficult, in fact, that we built GDPR-specific patterns on top of our classification engine to do just that.
The Varonis Data Security Platform maps your data stores, so that you can monitor and analyze data that falls under the GDPR. This map contains the folders and permissions for all storage volumes where GDPR sensitive data can exist, from a NetApp server to EMC Isilon to Windows to Office 365 (and beyond).
Once you have that map of data, you can begin the process of scanning those files for GDPR data. We see GDPR data in word documents, spreadsheets, notepad files, even XML files. Our Data Classification Engine is file type agnostic, so we will find the data even if it’s zipped.
Varonis GDPR Patterns has over 250 patterns and regexes for GDPR data, covering all 28 EU countries. It identifies and flags data that looks like an IBAN number, social security number, passport number, personal ID card, VAT number, mobile phone number, license plate number, tax registry number, and much more. You’ll be able to review the results in the DatAdvantage console with a GDPR category tag.
It can take a few weeks to scan all of your unstructured data stores if you run the system 24×7. This is one task where throwing processor power at the problem does make it go faster. You can also distribute the work across several Varonis Collector Servers to multiply the number of CPUs doing the work. The more, the merrier! And don’t worry – the Collector caps the amount of CPU the Data Classification Engine can use, so there’s minimum performance impact, and plenty of space left for the rest of the OS to do work.
On an 8 CPU system, Data Classification Engine can scan around 100GB per hour per Varonis Collector Server. In a day, that comes to 2.4 TB of data per Collector.
Disclaimer: These numbers are based on internal testing, your mileage may vary.
How Can I Find New GDPR Data?
Data Classification Engine continues to scan your data after the initial scan is complete, since users will update and add data faster than you can lock them down. Varonis updates the previously mentioned folder and permissions map daily (or whatever you configure) and then adds modified folders back into the queue to get scanned again. Data Classification Engine does not stop, it doesn’t feel pity or remorse, it will find all the GDPR data, and then it still won’t stop.
Once you discover your GDPR data, you need to figure out what to do with it – how to manage, process, and report on it – which I’ll cover in the next few parts of this series.
If you already know you need to prepare for GDPR, see how you’re doing with a free GDPR Readiness Assessment. We’ll do an assessment of your current state and present a report highlighting GDPR data, potential vulnerabilities, and strategies to protect that data.