Varonis debuts trailblazing features for securing Salesforce. Learn More

Varonis announces strategic partnership with Microsoft to acclerate the secure adoption of Copilot.

Learn more

5 Things You Should Know About Big Data

2 min read
Last updated January 17, 2023

Giant T Rex Big data is a very hot topic, and with the Splunk IPO last week seeing a 1999-style spike, the bandwagon is overflowing.  We’re poised to see many businesses pivoting into the big data space or simply slapping a big data sticker on their products—accurate or not—just to ride the wave.

This post aims to help educate you with a few byte-sized big data concepts (not just trivia) so that you can distinguish the substance from the hype.

Hate computers professionally? Try Cards Against IT.


1. Big data is distributed data

Big data is a nebulous term with many different definitions.  The key thing to remember is that in this day and age, big data is distributed data.  This means the data is so massive it cannot be stored or processed by a single node.

The days of buying a single big iron server from IBM or Sun to handle all your business intelligence needs are long gone.  It’s been proven by Google, Amazon, Facebook, and others that the way to scale fast and affordably is to use commodity hardware to distribute the storage and processing of our massive data streams across several nodes, adding and removing nodes as needed.

2. You’re going to hear the words “Hadoop” and “MapReduce”

What is Hadoop?   It is an open source platform for consolidating, combining and understanding large-scale data in order to make better business decisions. Hadoop is the technology powering many (but not all) big data analytics infrastructures.

There are 2 key parts to Hadoop:

  • HDFS (Hadoop distributed file system) which lets you store data across multiple nodes.
  • MapReduce which lets you process data in parallel across multiple nodes.

Although Hadoop is one of the most popular solutions for crunching big data — there are plenty others.  Big data can’t be shoehorned into one flavor of technology.  The important characteristic is that you’re able to draw insights from large quantities of data, independent of specific technologies.

3. You can understand MapReduce without a degree from Stanford

The best plain English explanation of MapReduce I’ve encountered (paraphrasing):

We want to count all the books in the library.  You count up shelf #1.  I count up shelf #2.  That’s map. Now we get together and add our individual counts.  That’s reduce.

For a deeper understanding, Wikipedia is a good place to start.

4. Distributed data generation is fueling big data growth

The reason we have data problems so big that we need large-scale distributed computing architecture to solve is that the creation of the data is also large-scale and distributed.  Most of us walk around carrying devices that are constantly pulsing all sorts of data into the cloud and beyond – our locations, our photos, our tweets, our status updates, our connections, even our heartbeats.

For every human-generated piece of data there’s likely associated machine-generated data.  And then there’s the metadata.  The data is abundant and it’s extremely valuable.

5. Machine learning is…awesome!

One of the key differentiators in big data analytics are the machine learning algorithms used to answer interesting questions and derive value from the 0s and 1s we’re furiously chewing up and spitting back out.

Some pretty cool examples:

  • Nest – a beautiful thermostat that learns how hot or cold you like your house so you never have to adjust it again (not technically big data, but fun nonetheless)
  • Gmail’s Bayesian spam filter – no more tempting emails from that pesky Nigerian prince!
  • Amazon’s product recommendations – sure, I’ll take a JavaScript book, a pair of Asics, and season 1 of Game of Thrones.  How do they know me so well?!
  • Varonis’ access control recommendations – ratchet down access based on highly accurate analytics.

If you’re interested in learning more about big data, join our webinar this Wednesday on Mastering Big Data.

photo credit:

What you should do now

Below are three ways we can help you begin your journey to reducing data risk at your company:

  1. Schedule a demo session with us, where we can show you around, answer your questions, and help you see if Varonis is right for you.
  2. Download our free report and learn the risks associated with SaaS data exposure.
  3. Share this blog post with someone you know who'd enjoy reading it. Share it with them via email, LinkedIn, Reddit, or Facebook.
Try Varonis free.
Get a detailed data risk report based on your company’s data.
Deploys in minutes.
Keep reading
Speed Data: Film, Foodies, and the Future of Tech With David Ulloa
Dr. David Ulloa, Chief Security Information Officer at IMC Companies, shares the best line of defense against a sophisticated threat actor.
Varonis joins Marsh McLennan Agency’s Cyber Resiliency Network
Varonis is teaming up with Marsh McLennan Agency. Together, we'll help organizations improve their cyber resilience with industry-leading DSPM solutions.
DSPM Report Highlights Risks That Lead to Significant Data Breaches  
Varonis' new DSPM report reveals that typical companies are widening their blast radius by oversharing permissions, excess ghost users, lack of MFA, and more.
Speed Data: Thinking From a Cyberattacker's Perspective With Dalal Alharthi
Dr. Dalal Alharthi talks about the importance of organizations anticipating a breach and seeing the world through the eyes of an attacker.