Lessons from the Twitch Data Leak

What happened?

Increasingly covered by the mainstream press throughout Wednesday, October 6, 2021, the impact of the recent Twitch leak will undoubtedly grow as bad actors take advantage of the exposed data originating from the Amazon-owned live video streaming site favored by gamers.

Confirming the incident in a blog post shared late on Wednesday evening, Twitch believes that a server configuration change may have exposed data to the internet leading to the subsequent access by a “malicious third party”.

The threat actor responsible seemingly took advantage of this misconfiguration to gain unauthorized access to Twitch data leading to the anonymous publication of ‘twitch leaks part one’ posted on the often controversial imageboard ‘4chan’ (Figure 1).

Figure 1. ‘twitch leaks part one’ – Shared via 4chan

It is not yet clear if the misconfiguration allowed unauthorized access to the repositories themselves or if some backup of the leaked data was exposed.

Why did it happen?

Posted on Wednesday, October 6, 2021, at 03:34:18hrs (UTC), the threat actor’s motivation for releasing the leak makes reference to the hashtag ‘#DoBetterTwitch’ which, alongside ‘#TwitchDoBetter’, has been used in social media campaigns that challenge Twitch to better protect their marginalized and harassed creators.

Seemingly part one of a larger haul of stolen data, the post includes a link to some 125GB of compressed data, reportedly including 6,000 internal Git repositories and revenue payout reports for Twitch’s top streamers (Figure 2), accessible via the BitTorrent peer-to-peering (P2P) file distribution system.

Figure 2. Twitch streamer revenue directory names (as seen within the Torrent)

When did the breach occur?

Analysis of a sample set of this leaked data suggests that Twitch’s internal code repositories were accessed and Zip-compressed archives created between Friday, October 1 and Monday, October 4, 2021, although the file modification times across this period provide no obvious indication of the threat actor’s timezone.

Furthermore, the directory structure relating to Twitch payouts implies that access could have been maintained until at least October 5, 2021, given that date is the most recently available GZipped CSV file, which would be somewhat consistent with the leak being prepared for release on October 6, 2021.

Lessons

Whilst Twitch continues to investigate the incident and understand the wider impact, this, like so many attacks, provides an opportunity for others to learn from their misfortune and better prepare themselves to avoid falling victim to a similar situation.

Misconfiguration

‘To err is human’, and you don’t have to look too far to see the widespread impact a misconfiguration can have on the availability, or in this case security, of an organization.

Change management processes should help to prevent these negative situations from arising but, what happens when a configuration change ‘appears’ to be successful but inadvertently introduces new and unseen problems?

Of course, many configuration changes may be the result of a cybersecurity need, such as applying updates and patches, although it is important to ensure that security is always taken into consideration so as to understand any potential outcome or impact.

Major outages are easy to identify, especially when your online presence goes dark and your customers starting shouting. Conversely, accidentally exposing your data to the wider internet is unlikely to be so obvious and those that stumble across it may be less inclined to share their discovery.

In the latter case, having visibility of your data assets, be they hosted internally or on some cloud service, can provide an early warning of an undesirable situation such as identifying anomalous access behaviors or data movements that are not consistent with your organization’s day-to-day activities.

Having gained unauthorized access to data, a threat actor’s behaviors will often betray them:

Rather than accessing specific data in the course of their job, a normal user behavior, a threat actor will seek to explore vast quantities of data to locate your most valuable assets, be that financial records, personally identifiable information (PII), or intellectual property (IP).
Once located, a threat actor will attempt to exfiltrate vast quantities of data, often utilizing file compression, to some third-party location or service.
To avoid detection, access attempts may occur out-of-hours and/or originate from geolocations that are inconsistent with your organization’s.

This is the idea behind user behavior analytics: use machine learning to build peace-time profiles for each user and device so we can alert when behavior meaningfully deviates from the norm.

Threat models that detect subtle increases in access to sensitive and idle data often tip our customers off to malicious insiders and stealthy threat actors before they’re able to exfiltrate data or move laterally within the network.

Data Exposure

Data leaks originating from code repositories can be a rich source of sensitive information that can be easily abused when in the wrong hands.

Source Code

The most obvious concern is that of the exposed source code itself that could, in the hands of a suitably skilled threat actor, be scoured to identify vulnerabilities that can be later exploited in an attack against an organization’s own infrastructure or, in the case of end-user applications, against their customers.

Credentials & Secrets

A common occurrence across code repositories, both private and public, is the exposure of sensitive credentials and secrets used within applications to authenticate to various services.

Threat actors are adept at specifically searching for these strings, often as simple as locating a value assigned to ‘AWS_SECRET_KEY_ID’ or ‘api_key’, and can easily utilize these to gain access to additional cloud services and/or hosts.

We demonstrate this technique in our attack lab, Cross-Cloud Hacking: Stealing Salesforce Data via GitHub & Slack:

Furthermore, in addition to direct exposure within a repository’s source code, the commit history can potentially be leveraged to gain access to sensitive information. For example, a developer may modify a file to remove some inadvertently exposed sensitive information yet it may remain accessible by viewing the differences between commits.

Internal Knowledge

Whilst it is important that organizations maintain detailed documentation on their infrastructure and security models, the exposure of this incredibly sensitive internal knowledge is highly valuable to threat actors.

Almost all targeted attacks commence with a reconnaissance phase in which the threat actor attempts to gather as much intelligence on their target as possible. On occasions, this may be an iterative process that, having gained limited access, further intelligence is gathered in preparation for the next phase of their attack.

Not only can the intelligence gathered be used to identify weaknesses or gaps in an organization’s defenses, but it also allows the threat actor to select high-value targets, be that a system or user, as well as enabling the creation of effective exploits or convincing social engineering lures.

Gleaning knowledge on how an organization will react to an attack can prove invaluable to the threat actor, a tactic successfully adopted by big-game hunter ransomware groups to thwart recovery efforts and, in some cases, even allow ransom demands to be made that match the coverage level of an organization’s cyber risk insurance.

Although organizations should not rely on security through obscurity, it is beneficial to protect the details of any deployed countermeasure to prevent them from being bypassed or even used against the very things they are set to protect.

In this incident, multiple references to a third-party authentication solution are not only present, and easily identifiable, but application credentials including an application key and secret key appear to have been exposed in a configuration file.

Not only does this practice oppose the official advice of this specific vendor, threat actors gaining access to these secrets could gain the ability to bypass multi-factor authentication (MFA), perform privileged operations and/or cause legitimate accounts to be locked out.

Whilst the real-world impact of this particular exposure is dependent on Twitch’s implementation of the vendor’s API or SDK, it again demonstrates the need to ensure that credentials and secrets are securely stored and handled to prevent misuse.

Recommendations

Although measures to isolate or encrypt code repositories may prove impractical, especially given the need for developers to collaborate on various projects, there are a number of measures that can be employed to minimize risk:

Ensure that your access controls are not overly permissive, enforcing the principle of least privilege can both limit insider abuse and lessen the impact of a compromised account.
Both user credentials and access keys should be protected, utilizing multi-factor authentication (MFA) wherever possible, password protection, and secure storage.
Credentials should be stored securely and separately from source code in addition to regularly auditing repositories to identify, remove and refresh any that are inadvertently exposed.
The ability to audit, alert and act on unauthorized or excessive access attempts can provide both an early warning of nefarious activity as well as providing damage limitation.
It is important to understand and classify data assets, allowing determination of risk as well as the identification and remediation of over-exposure.
Sensitive data and knowledge should be well guarded and only accessible by those with a need to know.

What should I do now?

Below are three ways you can continue your journey to reduce data risk at your company:

Schedule a demo with us to see Varonis in action. We'll personalize the session to your org's data security needs and answer any questions.

See a sample of our Data Risk Assessment and learn the risks that could be lingering in your environment. Varonis' DRA is completely free and offers a clear path to automated remediation.

Follow us on LinkedIn, YouTube, and X (Twitter) for bite-sized insights on all things data security, including DSPM, threat detection, AI security, and more.

Dvir Sason Dvir leitet das Forschungsteam von Varonis. Er verfügt über ~10 Jahre Erfahrung im Bereich der offensiven und defensiven Sicherheit mit Schwerpunkt auf Red Teaming, IR, SecOps, Governance, Sicherheitsforschung, Bedrohungsdaten und Cloud-Sicherheit. Dvir ist CISSP- und OSCP-zertifiziert und liebt es, Probleme zu lösen, Automatisierungen zu programmieren (PowerShell ❤, Python) und Dinge zu verändern.