Exposing hard-coded credentials and sensitive secrets through public code repositories has been a major security risk for organizations for years, with over 10 million new instances of credential leaks detected on GitHub alone in 2022. A new free service called HasMySecretLeaked now allows organizations to securely and privately check if any of their secrets are in a database of 20 million exposed records collected by security firm GitGuardian since 2020.
GitHub already has its own free service that notifies repository owners if secrets are detected in their public repositories, but the types of secrets that are monitored are typically cloud API access keys or other access token formats provided by partners. GitGuardian’s HasMySecretLeaked covers many more types of hard-coded secrets, both service-specific and generic ones, including database passwords, encryption keys, username and password combinations, messaging tokens, SSH credentials, and email passwords.
The company has been scanning every public code commit on GitHub for hard-coded secrets for the past several years, refining its detection algorithms, expanding the list of supported credential formats, and lowering false-positive rates. In 2020 it uncovered 3 million exposed secrets on GitHub, in 2021 it found 6 million, and in 2022 over 10 million.
GitGuardian used its research to release an annual report called The State of Secrets Sprawl as well as to build and enhance its own code security platform that prevents developers and engineers from accidentally leaking secrets in their code, build scripts, Docker images, configuration files and so on.
Search your own repositories vs. searching all
Secret-detection services have generally been built with the goal of serving repository owners. GitHub will notify the repository owner if a secret is detected in a repository they own and will also notify a partner service like AWS if the secret is an AWS key so that Amazon can make the decision to revoke it before it’s abused. GitGuardian’s own security platform will notify the organization if a secret is found anywhere in their software development pipeline: code, Docker images, DevOps environment, etc.
However, HasMySecretLeaked was built with another goal: to let organizations check if any of their known secrets were leaked anywhere on GitHub, including repositories owned by other parties. External leaks are not unusual. For example, one of the company’s developers might decide to publish a piece of code in his own public repository and accidentally forgets to scrub one of the organization’s tokens. Or a company’s developers are allowed to contribute to a community project but forget to remove a private database URL that includes credentials.
In fact, HasMySecretLeaked is similar in approach and takes a lot of inspiration from HaveIBeenPwned, a service by security researcher Troy Hunt that allows users to check if their emails and passwords have been leaked in publicly known data breaches. In both cases, care had to be taken to prevent attackers from abusing the service and to perform the searches without exposing the secrets to the service owner.
Building an API search for secrets without leaking secrets
After deciding they want to build a service that lets users search their huge database of leaked secret incidents, the GitGuardian researchers came across the first implementation problem: How will users submit their secrets to GitGuardian’s service without GitGuardian actually seeing them in plain text and creating a privacy and security problem?
The answer might seem straightforward: Use hashing. Hashing is a cryptographic representation of a string that is supposed to be irreversible — although some older and weak hashing schemes can be cracked using brute-force methods. Think of it as one-way encryption where the key is destroyed. This doesn’t fully solve the problem, though.
“By definition, if the hashed secret is present in our database, it implies that its cleartext version was once publicly accessible, indicating that GitGuardian has or had knowledge of it,” the GitGuardian researchers explain. “In other words, the user would be leaking their secret to us. And that’s not acceptable either.”
The solution is to send only a fragment of the hash — for example the first five characters. This would likely match multiple entries in the database that happen to have hashed versions that start with the same five characters. GitGuardan wouldn’t know which of those is the user’s secret or if it’s even any of them, so it would have to return all the entries that match from its database along with the location where they were seen. This generates another privacy problem: The user would receive the location of leaked secrets it doesn’t own.
So how to ensure the user can only read the part of the response that matches their own secret? Some clever use of encryption. The service encrypts each of the entries (secret + location) with the full-length hash of the leaked secret. GitGuardan knows the plaintext and therefore hash of every leaked secret so they can use the hash as an encryption key. The user only knows the full hash of their own secret they wanted to search to use as a decryption key.
Therefore, if one of the entries returned by GitGuardian in the “response bucket” based on the first five characters of the hash sent by the user happens to correspond to the user’s secret, the user will only be able to decrypt that entry from the response.
While this solves the privacy issue, it’s not very efficient because the client on the user side must use its hash to try to decrypt a full list of entries. That could take a long time. Imagine scaling this to a long list of secrets and hashes the user wants to search for. It would be easier if the service provides an indicator or hint of which entry from the response is likely to match and should be decrypted instead of trying them all. The answer? A hash of the hash.
In addition to using the hash of a secret itself as the encryption key for more information about the secret such as the source of the leak, the service hashes the hash of the secret, producing another hash and sends it alongside the entry as a hint. The client knows his own secret and its corresponding hash and can hash the secret’s hash with the same algorithm the service uses and then compare it to the hints in the response entries. If it matches any of the hints, the client knows that’s the entry it can decrypt.
Additional safeguards are in place to prevent abuse of the service by attackers who want to use the service to check for leaked secrets or find the location of secrets for which they somehow have a leaked hash that was generated using the same algorithm as the service. The service uses a “pepper,” a global unique value that gets combined with hashes to make the hashing algorithm more unique.
Also, rate-limiting for non-authenticated users limits them to five queries per day. Authenticated users have limits as well but less stringent. The service will also only return the first location where a secret was found, even though that secret might have been found in multiple locations inside a repository or in multiple repositories. This is to restrict the amount of information that can be misused in case an attacker abuses the system.
However, this information restriction also limits the investigation capabilities of organizations who might want to know not only if one of their secrets was leaked, but also how many times, where it happened, and who leaked it. The answers to those questions might be valuable to deciding incident response actions beyond just revoking that secret.
How to use HasMySecretLeaked and future plans
In addition to using the service through the web interface on the HasMySecretLeaked website, users can query the API by using GitGuardian’s command-line tool called ggshield. The tool requires an account with GitGuardian’s platform, which is free for organizations with teams of up to 25 developers.
GitGuardian tells CSO that in the future it hopes to expand the service and secrets scans beyond just GitHub repositories to things like packages in public registries like npm and PyPI, as well as public Docker images on DockerHub. They are also considering whether all the locations of a secret could be shared as part of the response.
Data Breach, DevSecOps, Risk Management