GitLab Semgrep SAST Analysis... But More
GitLab continues to migrate Static Application Security Testing (SAST) to Semgrep, and makes this available to all GitLab tiers. This analysis only includes the rules that GitLab manages, but there are many more available in the Semgrep Rules project. This post details how to combine the two to get a more comprehensive analysis.
GitLab Static Application Security Testing #
For several years, since June 2020 with GitLab 13.3, GitLab has provided the capability for Static Application Security Testing (SAST) to analyze code for potential vulnerabilities in all GitLab tiers, including Free. There are some additional features available in GitLab Ultimate, but the core analysis and customization of scanners and settings is available to all.
GitLab SAST supports over 20 languages and frameworks, and is
easy to implement
by adding the following to your .gitlab-ci.yml
file:
include:
- template: Jobs/SAST.gitlab-ci.yml
That's it. The SAST template has rules
configured to only run the appropriate
analyzers based on the languages used in the codebase (checked by file
extension). Each analyzer runs in a dedicated job to parallelize the work and
avoid the need for Docker-in-Docker.
The most broadly used analyzer is Semgrep, an open source engine for static code analysis, combined with a set of GitLab developed and managed SAST rules. Semgrep currently covers about one third of the languages and frameworks supported by GitLab's SAST analysis. GitLab also has a broader effort in work to migrate all SAST analysis to Semgrep, as noted in the documentation and tracked in this epic.
One challenge with GitLab only using their own managed rules is a limited number of rules, which is more evident in some languages. Here are the current counts of GitLab SAST rules for Semgrep by language.
Language | Rules |
---|---|
C | 62 |
C# | 20 |
Go | 30 |
Java | 64 |
JavaScript | 11 |
Python | 70 |
Scala | 87 |
TypeScript | 11 |
Semgrep, however, has it's own set of rules, and many, many more rules in some cases, as shown here for a subset of these languages.
Language | Rules |
---|---|
Go | 41 |
JavaScript | 163 |
TypeScript | 165 |
This post details what is believed to be the best way to consolidate these rules to have the most comprehensive SAST testing.
Use Semgrep as a standalone job? #
The Semgrep documentation does have instructions for running in GitLab CI, so why not just use that? There are a few significant issues with that implementation.
- There's a strong push to use Semgrep Cloud Platform, which provides a custom dashboard for managing rules, integration to provide MR comments, additional features with their paid subscriptions, etc. The goal for this use case is to augment what GitLab SAST already provides, so having another tool to use to manage results is not desirable. And some of the capabilities provided there, for example diff-aware scanning, links to code, and MR comments are already better integrated through GitLab's existing merge request capabilities, especially for GitLab Ultimate.
- The Semgrep rules are mix of several categories: security, best practices, correctness, and maintainability. The goal of this job is SAST analysis, so only the security rules are desired. For any of the languages of concern, there are extremely capable language-specific tools for linting best practices, correctness, and maintainability. Additionally, the GitLab SAST results feed into a SAST report, managed through GitLab's security capabilities, and that workflow is not intended to manage other code quality resources (for example, the option to for Security team approval for violations).
- In some cases, both GitLab and Semgrep have implemented the same rules. So, using both would result in duplicate findings, and the overhead and frustration that comes with that for SAST rules that are already notoriously noisy.
- The Semgrep rules, either individually or as a collection, are not versioned. The rules each have a SHA256 hash to verify integrity, and a last updated timestamp, but nothing comparable to a "release" that indicates rule changes, and the implications of those changes, to the user.
The combined solution #
So, a different solution was needed that addresses the concerns previously identified. This solution:
- Builds on GitLab's
semgrep-sast
job, which includes their Go-based analyzer that wraps Semgrep, runs it with a specific set of rules, and provides results in a GitLab SAST formatted report. - Adds the applicable Semgrep security rules from the official repository (including any associated example files). As noted previously, this repository has no tagged versions, so the latest rules from the default branch are used.
- Identifies Semgrep rule changes to users, and is released with semantic versioning.
- Eliminates rules duplicated between GitLab SAST and Semgrep. For simplicity, and to make this project a pure extension of GitLab SAST, the GitLab rules are used where there are duplicates.
This led to the
GitLab Semgrep Plus
project. The container image is derived from the GitLab Semgrep analyzer. The
semgrep-rules/
directory in the repository includes copies, unchanged, of all
security rules and examples from Semgrep rules for Go, JavaScript, and
TypeScript, which are included in the analysis. This makes changes to these
rules and code examples explicitly visible to users.
This could contain rules for the many other languages that Semgrep supports, but these are the only languages currently in use in the GitLab CI Utils projects where this was initiated.
The rules, without examples, are included in the /rules
directory in the
container image. GitLab's Semgrep analyzer is designed to use all rules in that
directory, so no other changes are required to include them in the analysis. A
shell script with all of the applicable logic is used to update the rules so
that the changes are made consistently with an automated process.
Managing ongoing Semgrep rule changes #
The project is setup to automatically detect changes to the Semgrep Rules project and incorporate any changes to security rules for the specified languages.
Since Semgrep Rules does not identify versions, Renovate is used to track the latest git ref of the development branch, and it initiates a merge request with any ref changes. The CI pipeline is setup with a trigger job to check for rule updates (if that check has not been done). This job is triggered in another project to isolate the credentials that can push commits back to this project.
The triggered pipeline clones this project, clones the Semgrep Rules project,
re-runs the shell script to update rules, and pushes any updates. The commit
uses --amend
to leave only one commit with an updated message, and is made
with Renovate bot account, so Renovate continues to propagate changes to Semgrep
Rules if the merge request is not immediately merged. The commit message is also
updated to indicate rule changes were incorporated. Since the updated Semgrep
rules are pushed, a new pipeline is triggered, and the updated commit message is
used in the job rules
to stop an infinite loop of triggered rule update
pipelines.
Usage in GitLab CI #
The GitLab SAST template is already setup to specify the container image for all
SAST jobs via variables
, so the implementation in GitLab CI is simply:
include:
- template: Jobs/SAST.gitlab-ci.yml
semgrep-sast:
variables:
SAST_ANALYZER_IMAGE: registry.gitlab.com/gitlab-ci-utils/gitlab-semgrep-plus:latest
With that the Semgrep rules are included in the analysis, and the corresponding report.