DevSecOps - Scan GitHub org. repos for secrets

Background

Virtually all organizations and companies make use of code to create applications to serve various purposes - to serve clients as a product, to serve a team internally, and so forth. Most of these applications utilize some kind of sensitive data - database credentials, API tokens, login credentials, etc. to perform different tasks. Now using these sensitive details (or secrets) in code is unavoidable as they provide different kinds of utilities be it access to data, the ability to send information to cloud or trigger another service but as all organizations use some form of version control, the most popular choice being Git, these secrets needs to stored carefully and not hard-coded into these applications.

When secrets get committed to the code repository with version control, they become a part of the history of all changes made to the code base and in the event of someone gaining access to the repository, one can retrieve these secrets and gain unauthorized access to services and/or data that they were not privy to. Simply removing the secrets from the repository and adding a new clean commit also does not solve the problem as the secret(s) still remain a part of the repository’s version control history and after gaining access to the source repository, an attacker could move to a different commit and still be able to recover these secrets.

What is Secrets Scanning?

Secrets Scanning refers to the process of identifying secrets embedded in code repositories. There exist a lot of tools that we can use to perform secrets scanning. These tools come with a lot of utility built-in thus making them better to use as compared to an internally developed solution. Some of these features include - version control history scanning, custom signatures, multiple reporting formats, etc. An engineer can choose the tool that fits their context and requirements the best. Some examples of secrets scanning tools are - Git-secrets, Trufflehog, detect-secrets.

Scanning a GitHub Organization for Secrets

For the purposes of this article we will be using Git-secrets. Optionally, I would also recommend to run this scan on a virtual machine running on the cloud as doing it locally might prove to be slow depending on factors like - number of repositories in the GitHub organization, how large the code base is for the different repositories, etc.

Prerequisites

If performing these steps in a Cloud-based virtual machine, then SSH into the machine before continuing. Ignore this step if performing the steps locally
Install gitim by using the following documentation
Install git-secrets by using the following documentation
Generate an SSH key pair with the following command and note the public key’s value along with the path where the key is being stored

ssh-keygen -t ed25519
Add the Public SSH key we took note of in Step 6 to GitHub by following this documentation
Create a file named signatures under /home//git-secrets/ and add the following Regular expressions to it (these signatures are taken and modified to fit our tool from the signature list of another secrets scanning tool called Trufflehog, and the original list can be found here)

(xox[p|b|o|a]-[0-9]{12}-[0-9]{12}-[0-9]{12}-[a-z0-9]{32})

-----BEGIN RSA PRIVATE KEY-----

-----BEGIN DSA PRIVATE KEY-----

-----BEGIN EC PRIVATE KEY------

----BEGIN PGP PRIVATE KEY BLOCK-----

AKIA[0-9A-Z]{16}

amzn.mws.[0-9a-f]{8}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{4}-[0-9a-f]{12}

EAACEdEose0cBA[0-9A-Za-z]+

[f|F][a|A][c|C][e|E][b|B][o|O][o|O][k|K].*[’|”][0-9a-f]{32}[’|”]

[g|G][i|I][t|T][h|H][u|U][b|B].*[’|”][0-9a-zA-Z]{35,40}[’|”]

[a|A][p|P][i|I][_]?[k|K][e|E][y|Y].*[’|”][0-9a-zA-Z]{32,45}[’|”]

[s|S][e|E][c|C][r|R][e|E][t|T].*[’|”][0-9a-zA-Z]{32,45}[’|”]

[0-9]+-[0-9A-Za-z_]{32}.apps.googleusercontent.com”

type”: “service_account”

ya29.[0-9A-Za-z-_]+

AIza[0-9A-Za-z-_]{35}

[h|H][e|E][r|R][o|O][k|K][u|U].*[0-9A-F]{8}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{4}-[0-9A-F]{12}

[0-9a-f]{32}-us[0-9]{1,2}

key-[0-9a-zA-Z]{32}

[a-zA-Z]{3,10}://[^/\s:@]{3,20}:[^/\s:@]{3,20}@.{1,100}[”‘\s]

access_token$production$[0-9a-z]{16}$[0-9a-f]{32}

sk_live_[0-9a-z]{32}

https://hooks.slack.com/services/T[a-zA-Z0-9_]{8}/B[a-zA-Z0-9_]{8}/[a-zA-Z0-9_]{24}

sk_live_[0-9a-zA-Z]{24}

rk_live_[0-9a-zA-Z]{24}

sq0atp-[0-9A-Za-z-_]{22}

sq0csp-[0-9A-Za-z-_]{43}

SK[0-9a-fA-F]{32}

[t|T][w|W][i|I][t|T][t|T][e|E][r|R].*[1-9][0-9]+-[0-9a-zA-Z]{40}

[t|T][w|W][i|I][t|T][t|T][e|E][r|R].*[’|”][0-9a-zA-Z]{35,44}[’|“]

Generate a Personal Access Token (PAT) on GitHub from an account that has access to the GitHub Organization we would be scanning using the following documentation
1. For step 7, choose an expiration that suits your requirements but avoid creating a token with no expiration date
2. For step 8, select the following permissions

Performing the scan

To scan a GitHub Organization for secrets perform the following steps:

SSH onto the Cloud VM that was set up (Ignore this step if you’re performing the scan on your local machine)
Execute the following command to start the SSH agent on the machine (Ignore this step if you can confirm that the agent is already running)

eval $(ssh-agent -s)
Execute the following command to add the SSH private key (that we generated in step 4 in the prerequisites section) to the SSH agent, we will use this key to clone the repositories instead of the HTTPS method of cloning

ssh-add </path/to/private/key>
Execute the following command to create directories to contain all repositories from the organization and then switch directories to the shell is inside the GitHub Organization folder

mkdir -p <GITHUB ORG. NAME>/repos

cd <GITHUB ORG. NAME>
Execute the following command to run gitim and clone all repositories from the GitHub organization

python3 -m gitim --org <GITHUB ORG. NAME> -d repos/ --token <GITHUB PAT> --ssh
Run ls -lsa repos/ to confirm all repositories were cloned successfully
Create a reports/ and a_ reports/lists** **_directory by executing the following command

mkdir -p reports/
Create a bash file called secrets-scan.sh and paste the script below in the file

#!/bin/bash

REPOS=`ls $HOME/$1/repos/$1`

REPORTS_PATH="$HOME/$1/reports/"

SIGNATURES_PATH="$HOME/git-secrets/signatures"

echo $REPORTS_PATH

for repo in $REPOS

do

    absolute_path="$HOME/$1/repos/$1/$repo"

    echo "Scanning $repo..."

    cd $absolute_path

    git secrets --add-provider -- cat $SIGNATURES_PATH

    git secrets --scan > $REPORTS_PATH/$repo 2>&1

    if [ $? != 0 ]

    then

        echo $repo >> $HOME/$1/denylist

    else

        echo $repo >> $HOME/$1/allowlist

    fi

done

Create a bash file

Execute the bash script created in the previous step as

bash secrets-scan.sh <GITHUB ORG. NAME>
After the scan is complete, the script creates some files - allowlist, denylist and reports for each repository scanned inside the reports/ directory. The allowlist contains list of repositories that were not flagged by the scan, the denylist contains repositories that were flagged
For each repository in the denylist, we can check the report (with the same name as the name of the repository) inside the reports/ directory
For each (true-positive) secret identified, the following steps should be followed to avoid security issues:
1. Revoke the secrets
2. Rotate the secrets
3. Amend code to not have hardcoded secrets committed to the repository
4. Use secrets manager to store secrets or packages secrets separately with the code at the time of deployment

Notes on interpreting scan results

Secrets scanning is a great process to implement in the security workflows as committing secrets to the code is one of the most common mistakes that can happen. But since the tools rely on patterns defined by regular expressions or other mechanisms, they are not 100% correct in identifying these secrets. The results obtained are most probably going to be a mixture of false-positives and true-positives. There is a third kind of result which is not directly available in a file, the false-negatives, the secrets which the tool overlooked. Both false-positives and false-negatives are categories we need to deal with.

Dealing with false positives is fairly straightforward, we can create a list of known false positives and filter them from the final iteration of the report. Over time this list of false positives can be updated whenever a new one is encountered, ensuring that we do not have to deal with the same errors multiple times. In the demonstration above also, we encountered a false positive as shown below

cat script

As for false negatives, it would require the security engineer to have an understanding of what secrets a particular application could potentially use and this contextual exercise needs to be performed for each application, making this a rather tedious task as most organizations would have hundreds of code repositories if not more. The best way to deal with this case is to introduce proactive controls and knowledge in place for developers to enable them and ensure that they do not commit secrets in the first place. This could be done in various ways, one of which is to use git hooks for pre-commit checks and the developers could add regular expressions to their hooks based on the application they are working on and the kinds of secrets it might use.

Conclusion

Committing secrets to version control in our code repositories is one of the easiest mistakes to make. To ensure that these secrets do not get abused, we need to detect their existence and revoke and rotate them before they get into the hands of an attacker. To do this, we make use of tools combined with patterns to scan our repositories for secrets. We took a look at how we could use a few tools and some bash scripts to scan an entire GitHub organization for secrets and we also briefly looked at ways to handle the situation when a secret is identified. As with most tools, there will be both, false positives and negatives and we need to iteratively improve on our secrets scanning process to account for both of these and over time improve the accuracy of our results.

***

This article is brought to you by Kloudle Academy, a free e-resource compilation, created and curated by Kloudle. Kloudle is a cloud security management platform that uses the power of automation and simplifies human requirements in cloud security. If you wish to give your feedback on this article, you can write to us here.

Riyaz Walikar

Founder & Chief of R&D

Riyaz is the founder and Chief of R&D at Kloudle, where he hunts for cloud misconfigurations so developers don’t have to. With over 15 years of experience breaking into systems, he’s led offensive security at PwC and product security across APAC for Citrix. Riyaz created the Kubernetes security testing methodology at Appsecco, blending frameworks like MITRE ATT&CK, OWASP, and PTES. He’s passionate about teaching people how to hack—and how to stay secure.

•• See all posts

Riyaz Walikar

Founder & Chief of R&D

•• See all posts

← Back to Academy