A Guide on the Data Lifecycle: Identifying Where Your Data is Vulnerable

Data is a company’s most valuable asset. To maintain data’s value, it’s vital to identify where that data is vulnerable. According to data and ethics expert Dr. Gemma Galdon Clavell, there are five major moments where data is most vulnerable: collection, storage, sharing, analysis, and deletion. These vulnerability points increase the risk of a data breach – and we’ve all heard about the costs of having one.

Many of these vulnerability points are part of a cycle, known as the data lifecycle. The data lifecycle determines where the data lives: on premise, in the cloud, with third-party vendors and more. What’s more, understanding where the data lives within the system is how you can consistently take steps that protect the security as well as the privacy of that data.

Get a Free Data Risk Assessment

What is the Data Lifecycle?

This data lifecycle is a high-level, general process that describes how data can flow through an organization. Because the lifecycle presented is your organic, local garden-variety, it can be adapted to many different scenarios. The data lifecycle is an important guide for security and privacy pros to consider when protecting data.

Data Collection – Where the Data Lifecycle Begins

To understand why life starts at collection, let’s take a closer look at what happens at this stage. Without data collection, there would be nothing to analyze, no patterns to discover, or data-driven business plans to implement. Fortunately, the ability to collect data – and the acknowledgement of how important it is to collect data – is no longer an issue. In fact, there are overwhelming incentives to design technologies in a way that maximizes the collection of personal information for potential business use.

Data scientist and statistician Kaiser Fung warns against collecting data with no particular business problem: “You have a lot of expensive data, but there’s no measurement of that thing that you want to impact. And then in order to do that, you have to actually merge in a lot of data or try to collect data from other sources. And you probably often times cannot find appropriate data so you’re kind of stuck in this loop of not having any ability to do anything.”

Regulators have also caught on that the “collect first, ask questions later” mentality incentivizes design choices that may marginalize users’ privacy interests in how their data is collected and used.

Referenced in Article 25 of the General Data Protection Regulation, it’s now law for many companies to protect data by design and by default. This means companies need to integrate data protection principles into business practices right from the start – and throughout the data lifecycle. Based on a 20-year-old concept, Privacy by Design (PbD) consists of seven core principles that ensure strong privacy and personal control over a user’s personal information as well as a sustainable competitive advantage for organizations and positive-sum (win/win) paradigm.

Get Consent for Active and Passive Data Collection

Before you start collecting data, best practices recommend data collectors to first obtain consent prior to both active and passive data collection. Preventative measures help avoid miscommunication and enable the user to decide to opt-in or out.

User information is collected either actively or passively. It’s active collection when the user is aware of the data collection and passive when the collection and analysis takes place behind the scenes, and the user’s activity or movements could reveal a behavioral pattern.

A user filling out a web form is an example of active data collection. The user knows he will reveal his identity with a name, address, phone number and other types of personal information. If the user has her location services on, that location information is an example of passive data collection: the consent is implied and assumed due to active use and engagement of the platform, service, or product.

Types of Data Collection

Besides getting consent, another core PbD principle is to minimize the collection of data. In other words, limit personal data collection and only collect what’s necessary to only carry out the purpose for which user consented. This limits vulnerability and risk for both the data collector and data subject.

Be mindful of data minimalisation/minimization when collecting these types of data:

First-party data collection: when the user provides her personal data directly to the data collector.
Surveillance: when the collector observes data produced by the user without interfering with the user’s experience.
Repurposing data: when reusing previously collected data for a different purpose. Best practices recommend additional consent from the data subject. Depending on the industry, when data is collected for one purpose and then reused for a completely different purpose, it can be a violation of privacy rights and possibly illegal under some regulatory requirements.
Third-party collection: when collected information is transferred to a third-party for further analysis

Making Sense of the Data: Processing, Analyzing and/or Sharing

After collecting data, there’s often a strong desire for everyone in an organization have access to the data. With data widely available, creative connections can made more quickly, potentially advancing business initiatives and opportunities. Being first in an untapped marketplace is everything.

That’s a fair point of view, but security and privacy advocates have another recommendation. They’re not recommending absolutely no access. After all, without access to data, it’s a challenge to process and analyze the data.

Security advocates and NIST suggest that the balance is in limiting access to those who only need to use the data. The guiding principles are maintaining a least-privileged-model and role-based access controls where data owners grant appropriate users the access they need – and only the access they need – to do their jobs.

PbD author Ann Cavoukian says “You’ve gotta have restricted access to those who have a right to know – meaning there is a business purpose for which they’re accessing the data.”

Using a hospital as an example, she continues, “You want to enable easy access for those who have a right to know because they’re treating patients. And then the walls should go up for those who are not treating in any manner.”

Privacy advocates add that it’s vital that data collected is only used for the original purpose and intent. If you’re planning to use that data for another purpose, seek additional consent. Also, let your data subjects know if their data will be shared with third parties. It’s a user’s right to know.

Poorly protected folders with permissions that are more generous than they need to be often lure attackers. That’s where Varonis comes in handy: DatAdvantage identifies who can access data, who does access data, shows where users have too much access, and helps safely automate proper changes to access control lists and security groups. Meanwhile the Automation Engine helps put it all out autopilot: automatically repairing inconsistent ACLs and remediating global group access.

Data Retention Policies to Remediate Risk

Whether your data is on premise, in the cloud, or with 3^rd parties, companies need to consider how long data is retained by their system. Having data retention policies and procedures —what to keep, what to archive—in place is just IT common sense.

Data retention is covered in GDPR’s Article 5, which explains that data should only be retained for as long as is required to achieve the purpose for which data were collected and processed.

It’s not only the EU that takes this IT procedure seriously. Data retention limits also show up in the US’s HIPAA rules for personal health data and in some financial data security regulations. Data retention limits—measured in years—define the amount of time an electronic document must be kept.

The message is clear: lower your data security risk profile. If you don’t need data, delete it. The less data you have, the less damaging a breach will be. If it’s sensitive, make sure it’s only accessible to those who need it. Old and stale files are expensive and risky, which is why we have retention policies and software solutions such as Varonis Data Transport Engine – which helps archive, quarantine, and delete stale (and regulated) data.

Secure Data Destruction

Once you’ve identified what needs to be retained, the rest of the data is guided by another PbD principle: to reduce security risks by deleting or archiving unnecessary or stale sensitive data embedded in files. This makes incredible sense. Stale data can be consumer identifiers originally collected in short-term marketing campaigns, for example, but now reside in rarely used spreadsheets or management presentations. Your organization may no longer need it, but it’s just the kind of monetizable data that hackers would love to get their hands on.

To echo the significance of data minimization, Ann Cavoukian warns, “What idle data does is in identifiable form is attracts hackers. It attracts rogue employees on the inside who will make inappropriate use of the data, sell the data, do something with the data.”

If stale data is no longer needed and has reached the end of it’s data lifecycle, the level of destruction is determined by the sensitivity of the data. NIST has recommendations for sanitizing storage devices and destroying data that range from clearing the data by overwriting it to destroying the physical medium.

Protecting the Data Where It is Most Vulnerable: From Birth to Death

If data truly is more valuable than oil, it makes sense to protect data like you do with money. From birth till the end of the data lifecycle, it is possible to protect your organization’s data first, not last and achieve Zero Trust. Ann Cavoukian, who has worked with both government agencies and organizations and knows a thing or two about positive-sum paradigms says, “When you can present it that way, it gives you a seat at the table, every time.”

What should I do now?

Below are three ways you can continue your journey to reduce data risk at your company:

Schedule a demo with us to see Varonis in action. We'll personalize the session to your org's data security needs and answer any questions.

See a sample of our Data Risk Assessment and learn the risks that could be lingering in your environment. Varonis' DRA is completely free and offers a clear path to automated remediation.

Follow us on LinkedIn, YouTube, and X (Twitter) for bite-sized insights on all things data security, including DSPM, threat detection, AI security, and more.

Michael Buckbee Michael has worked as a sysadmin and software developer for Silicon Valley startups, the US Navy, and everything in between.