Machine Learning and Generative AI Use in Varonis

Using machine learning for cybersecurity in Varonis

Varonis employs machine learning, a subset of artificial intelligence (AI), to enhance cybersecurity in various domains. Machine learning involves using algorithms and statistical techniques to enable computer systems to learn from data and make predictions or decisions without explicit programming.

Examples of machine learning use at Varonis:

  1. Generating Security Alerts Based on Abnormal User Behavior: One crucial application of machine learning in cybersecurity is detecting security threats by analyzing user behavior. Varonis uses machine learning algorithms to monitor and analyze user or service account activities within an organization's network. By learning what constitutes "normal" behavior for users and service accounts, the system can identify deviations from these patterns, flagging them as potential security threats. These deviations can include unusual file access, login times, or data transfers, helping organizations respond promptly to potential breaches.
  2. Learning Peers’ Association: Another important aspect of threat detection is understanding the relationships between users and their peers within the network. Varonis' machine learning models can learn and identify which users typically interact or share data with each other. This information helps in detecting suspicious activities, such as unauthorized data sharing or privilege escalation.
  3. Learning Normal Working Hours: Machine learning is used to establish typical working hours for users within an organization. By analyzing historical data, the system can identify when users typically access resources, and any deviations from these patterns can trigger alerts. For example, if a user logs in during an unusual time, it could indicate a security incident.
  4. Personal Devices Identification: Varonis' machine learning algorithms can also learn and keep track of the devices being used by each user. This information is crucial for detecting unauthorized access or compromised devices. The system can raise a security alert if a user suddenly logs in from an unfamiliar device.

Application of learned properties for security alert generation

The learned properties mentioned above serve as the foundation for generating security alerts. Varonis' machine learning models continuously analyze user behavior, device usage, and peer associations. When any anomaly or suspicious activity is detected, the system triggers security alerts, which can prompt further investigation and response from cybersecurity professionals.

In summary, Varonis uses machine learning to bolster its cybersecurity efforts by continuously monitoring and analyzing various aspects of user behavior and system activity. By identifying anomalies and deviations from established norms, Varonis helps organizations proactively detect and respond to potential security threats, ultimately enhancing their overall cybersecurity posture.

Using the unsupervised machine learning model for threat detection

We are using unsupervised machine learning, which deals with unlabeled data. The machine learning (ML) algorithms are not provided with explicit output labels. Instead, they discover patterns, structures, or relationships within the data.;

Unsupervised machine learning plays a crucial role in cybersecurity threat detection, and it is the preferred option due to its ability to address certain challenges unique to the cybersecurity domain.

Some examples are:

  1. Anomaly Detection: Unsupervised machine learning is particularly well-suited for anomaly detection in cybersecurity. Anomalies in network traffic, system behavior, or user activities can indicate security threats such as intrusions or breaches. Since these anomalies often represent previously unknown attack patterns, using supervised learning with pre-labeled data is impractical. Unsupervised learning algorithms can identify deviations from established patterns without knowing what constitutes a threat.
  2. Data Exploration and Discovery: Cybersecurity datasets are vast and diverse, containing a wide range of data types, including logs, network traffic, system configurations, and user behavior. Unsupervised learning techniques, like clustering and dimensionality reduction, help to understand these complex datasets. They can reveal patterns, group similar events together, and reduce the data's dimensionality to focus on the most relevant features.
  3. Zero-Day Attacks: Zero-day attacks involve exploiting vulnerabilities that are previously unknown. Since there are no predefined attack signatures or labels for these threats, unsupervised learning is crucial. Anomaly detection algorithms can identify unexpected and potentially malicious activities, providing an early warning system against emerging threats.
  4. Insider Threat Detection: Detecting insider threats, where legitimate users abuse their access, often requires unsupervised learning. These threats are challenging to identify using supervised learning because malicious insiders can act within the bounds of their legitimate permissions. Unsupervised techniques can help by modeling normal user behavior and flagging deviations that might indicate insider threats.
  5. Continuous Learning: The threat landscape is constantly evolving, and new attack techniques emerge regularly. Unsupervised learning models can adapt to changing conditions and identify novel threats as they arise, making them suitable for continuous monitoring and detection.

Inputs to the models

The main categories of inputs are elaborated below:

1. Monitored Events:

  • File Servers: Events raised by file servers provide insights into file access, modification, and sharing activities. This data is vital for detecting suspicious or unauthorized file access, insider threats, and data exfiltration attempts.
  • Network Traffic: Events from network devices such as DNS, firewalls, proxies, and VPNs offer information about network communication patterns. Analyzing these events can reveal potential anomalies, intrusions, or malicious traffic attempting to breach the network.
  • SaaS Services: Events from SaaS services like Microsoft 365, Zoom, Okta, and others are essential for monitoring user activities within cloud-based applications. Detecting unusual activities, login attempts, or data access patterns in these services helps identify potential cyber threats in the cloud environment.
  • Mail Servers: Events from mail servers like Exchange and Exchange Online provide insights into email communications. Analyzing these events can help identify email-based threats such as phishing, malware attachments, or suspicious email forwarding.

2. File Content:

  • Content Analysis: Analyzing the content of files is crucial for assessing their risk and sensitivity. Machine learning models can classify files based on content, identifying documents that contain sensitive information, personally identifiable information (PII), or that violate data policies like GDPR, HIPAA, PCI DSS, or CCPA. This classification enables proactive data protection and compliance enforcement.

3. Entity Data:

  • User Information: User data, such as login activity, access privileges, and historical behavior, is used to create user profiles. Machine learning models can detect anomalies in user behavior, identifying potentially compromised accounts or insider threats.
  • Device Information: Information about servers and endpoints, including their configurations and patch levels, helps assess vulnerabilities and potential attack surfaces. Anomalies in device behavior or configurations can indicate security risks.
  • IP/Domain/URL Information: Monitoring IP addresses, domains, and URLs is essential for detecting malicious network activity, including connections to known malicious hosts, suspicious domain registrations, or attempts to access malicious websites.
  • Threat Intelligence: Incorporating threat intelligence feeds, which provide real-time information on known threats and vulnerabilities, helps cybersecurity systems stay updated and respond promptly to emerging threats. Machine learning models can leverage this intelligence to identify and mitigate threats based on known attack patterns. 

Diagram of Varonis' gen AI capabilities

V2_Light copy_Outline Text for Web@2x

To improve AI accuracy, when generative AI is asked a question, Varonis determines organizational context by retrieving from the customer metadata information, which is relevant to the query – information about accounts, roles, resources, permissions and classification, and known alerts. Then, the questions sent to AI are augmented with the retrieved “organizational context,” allowing AI to be aware of the specific customer environment and provide highly relevant, precise, and tailored answers.


Varonis uses unsupervised machine learning, which deals with unlabeled data. Unsupervised machine learning plays a crucial role in cybersecurity threat detection and is the preferred option due to its ability to address certain challenges unique to the domain.

Have questions? Contact us.

Have questions? Contact us.

Report a vulnerability

Report security issue

