new secrets detected
in public GitHub commits in 2022
We have never detected as many secrets, and secrets sprawl has been accelerating yearly since 2020.
Hard-coded secrets increased by 67% compared to 2021, whereas the volume of scanned commits rose by 20% (from 860M to 1.027B commits between 2021 and 2022).
Hard-coded secrets have never been a more significant threat to the security of people, enterprises, and even countries worldwide.
IT systems, open-source, and entire software supply chains are vulnerable to exploiting keys left by mistake in source code.
As the world's digital footprint grows, millions of such keys accumulate every year, not only in public spaces such as code-sharing platforms but especially in closed spaces such as private repositories or corporate IT assets.
In other words, secrets sprawl on GitHub is only the tip of the iceberg.
This wouldn’t be so concerning if credentials theft weren’t the most common cause of data breach. The 2022 editions of Verizon’s DBIR and the IBM Cost of a data breach reports highlighted that this attack vector remains the top concern since 2021:
“Use of stolen or compromised credentials remains the most common cause of a data breach. Stolen or compromised credentials were the primary attack vector in 19% of breaches in the 2022 study and also the top attack vector in the 2021 study, having caused 20% of breaches. Breaches caused by stolen or compromised credentials had an average cost of USD 4.50 million.”*
*From the IBM Cost of a data breach 2022
A look back at 2022 major incidents
Secrets are found in one way or another in most of the security incidents that happened in 2022. We can classify them into three categories:
Secrets exploited in an attack
An attacker breached Uber and used hard-coded admin credentials to log into Thycotic, the firm’s Privilege Access Management platform. They pulled a full account takeover on several internal tools and productivity applications.
An attacker leveraged malware deployed to a CircleCI engineer’s laptop to steal a valid, 2FA-backed SSO session. They could then exfiltrate customer data, including customer environment variables, tokens, and keys.
Stolen source code repositories
NVIDIA source code is leaked by the Lapsus$ group.
200GB of Samsung source code is leaked, revealing 6,695 hard-coded secrets.
250 Microsoft projects are leaked, revealing 376 hard-coded secrets.
LastPass source code is stolen, leaking credentials and keys used months later to access and decrypt storage volumes.
Dropbox disclosed that 130 stolen code repositories contained API keys.
Okta admitted a breach of its GitHub repositories resulting in source code theft.
Slack employee tokens are stolen and misused to download private code.
Secrets exposed publicly
Research reveals 18,000+ Android apps leak hard-coded secrets.
Toyota disclosed a contractor exposed a credential giving access to user data on GitHub for five years.
Tom Forbes disclosed Infosys leaked FullAdminAccess AWS keys on PyPi for over a year (and then 57 other AWS keys on PyPi).
About GitHub in 2022
HCL (Hashicorp Configuration Language) is the fastest-growing language on GitHub.
PUBLIC MONITORING
From the Octoverse 2022
How leaky was 2022?
secrets occurrences detected in 2022 (3M unique secrets)
authors exposed a secret in 2022
commits out of 1,000 exposed at least one secret (+50%)
GitHub's organic growth and the improvements of our detection engine (including +35 new detectors in 2022) partly explain the growth in the number of detected secrets. But all things equal, there is no doubt:
Secrets sprawl continues to expand worldwide.
Map of secrets leaks
For GitHub profiles mentioning location.
GitGuardian uses two classes of detectors: specific and generic.
‍Specific detectors match recognizable secrets, like an AWS access key or MongoDB database credentials.
In 2022, our specific detectors accounted for 33% of the secrets detected. Here are some of the top specific secrets caught in 2022.
On the other hand, generic detectors match a broad range of secrets, for example, a company email and a password that would end up hard-coded in a file.
In a detection strategy, generic detectors are essential to ensure that no valid secrets fall through the cracks of specific detectors. To maximize precision and avoid false positives, each uses a carefully crafted set of conditions (regarding the filename, the file path, the surrounding context, etc.)
In 2022, they accounted for 67% of the secrets detected, which shows their importance.
How does secrets sprawl threaten software supply chain security?
When weighing the risk posed by secrets sprawl, it’s essential to consider the ensemble of hard-coded plaintext secrets rather than individual secrets taken separately: the more secrets there are, the more potential attack vectors there are for a malicious actor.
The keychain [...] symbolizes the collection of one or more scattered secrets the attacker finds throughout the target environment. Although both components are individually unhygienic, they form a fatal compound when combined.
Hell’s Keychain illustrates how scattered plaintext credentials across your environment can impose a huge risk on your organization by impairing its integrity and tenant isolation. Moreover, the vulnerability emphasizes the need for strict network controls and demonstrates how pod access to the Kubernetes API is a common misconfiguration that can result in unrestricted container registry exposure and scraping.
From code to cloud: Infrastructure as code
A single misconfiguration in an IaC manifest can break a security policy and make the deployed infrastructure vulnerable to attacks.
Infrastructure as code is an entirely new attack surface to protect.
We estimate that 21.52% of all Terraform repositories have one or more security policy vulnerabilities.
The most common IaC vulnerabilities are:
- Networking misconfigurations: unrestricted egress or ingress traffic can expose assets to attacks such as remote code execution. The use of HTTP instead of HTTPS is also frequent.
- Data exposure misconfigurations: S3 buckets without encryption can lead to data leakage.
- Secrets: exposing a sensitive environment variable in the configuration can lead to a plain text credentials leak.
- Permission misconfigurations: using the default service account on a compute instance allows an attacker to spread through the network.