GitLab Semgrep SAST Analysis - But More

January 7, 2024

GitLab continues to migrate Static Application Security Testing (SAST) to Semgrep, and makes this available to all GitLab tiers. This analysis only includes the rules that GitLab manages, but there are many more available in the Semgrep Rules project. This post details how to combine the two to get a more comprehensive analysis.

GitLab Static Application Security Testing #

For several years, since June 2020 with GitLab 13.3, GitLab has provided the capability for Static Application Security Testing (SAST) to analyze code for potential vulnerabilities in all GitLab tiers, including Free. There are some additional features available in GitLab Ultimate, but the core analysis and customization of scanners and settings is available to all.

GitLab SAST supports over 20 languages and frameworks, and is easy to implement by adding the following to your .gitlab-ci.yml file:

include:
  - template: Jobs/SAST.gitlab-ci.yml

That's it. The SAST template has rules configured to only run the appropriate analyzers based on the languages used in the codebase (checked by file extension). Each analyzer runs in a dedicated job to parallelize the work and avoid the need for Docker-in-Docker.

The most broadly used analyzer is Semgrep, an open source engine for static code analysis, combined with a set of GitLab developed and managed SAST rules. Semgrep currently covers about one third of the languages and frameworks supported by GitLab's SAST analysis. GitLab also has a broader effort in work to migrate all SAST analysis to Semgrep, as noted in the documentation and tracked in this epic.

One challenge with GitLab only using their own managed rules is a limited number of rules, which is more evident in some languages. Here are the current counts of GitLab SAST rules for Semgrep by language.

Language	Rules
C	62
C#	20
Go	30
Java	64
JavaScript	11
Python	70
Scala	87
TypeScript	11

Semgrep, however, has it's own set of rules, and many, many more rules in some cases, as shown here for a subset of these languages.

Language	Rules
Go	41
JavaScript	163
TypeScript	165

This post details what is believed to be the best way to consolidate these rules to have the most comprehensive SAST testing.

Use Semgrep as a standalone job? #

The Semgrep documentation does have instructions for running in GitLab CI, so why not just use that? There are a few significant issues with that implementation.

There's a strong push to use Semgrep Cloud Platform, which provides a custom dashboard for managing rules, integration to provide MR comments, additional features with their paid subscriptions, etc. The goal for this use case is to augment what GitLab SAST already provides, so having another tool to use to manage results is not desirable. And some of the capabilities provided there, for example diff-aware scanning, links to code, and MR comments are already better integrated through GitLab's existing merge request capabilities, especially for GitLab Ultimate.
The Semgrep rules are mix of several categories: security, best practices, correctness, and maintainability. The goal of this job is SAST analysis, so only the security rules are desired. For any of the languages of concern, there are extremely capable language-specific tools for linting best practices, correctness, and maintainability. Additionally, the GitLab SAST results feed into a SAST report, managed through GitLab's security capabilities, and that workflow is not intended to manage other code quality resources (for example, the option to for Security team approval for violations).
In some cases, both GitLab and Semgrep have implemented the same rules. So, using both would result in duplicate findings, and the overhead and frustration that comes with that for SAST rules that are already notoriously noisy.
The Semgrep rules, either individually or as a collection, are not versioned. The rules each have a SHA256 hash to verify integrity, and a last updated timestamp, but nothing comparable to a "release" that indicates rule changes, and the implications of those changes, to the user.

The combined solution #

So, a different solution was needed that addresses the concerns previously identified. This solution:

Builds on GitLab's semgrep-sast job, which includes their Go-based analyzer that wraps Semgrep, runs it with a specific set of rules, and provides results in a GitLab SAST formatted report.
Adds the applicable Semgrep security rules from the official repository (including any associated example files). As noted previously, this repository has no tagged versions, so the latest rules from the default branch are used.
Identifies Semgrep rule changes to users, and is released with semantic versioning.
Eliminates rules duplicated between GitLab SAST and Semgrep. For simplicity, and to make this project a pure extension of GitLab SAST, the GitLab rules are used where there are duplicates.

This led to the GitLab Semgrep Plus project. The container image is derived from the GitLab Semgrep analyzer. The semgrep-rules/ directory in the repository includes copies, unchanged, of all security rules and examples from Semgrep rules for Go, JavaScript, and TypeScript, which are included in the analysis. This makes changes to these rules and code examples explicitly visible to users.

This could contain rules for the many other languages that Semgrep supports, but these are the only languages currently in use in the GitLab CI Utils projects where this was initiated.

The rules, without examples, are included in the /rules directory in the container image. GitLab's Semgrep analyzer is designed to use all rules in that directory, so no other changes are required to include them in the analysis. A shell script with all of the applicable logic is used to update the rules so that the changes are made consistently with an automated process.

Managing ongoing Semgrep rule changes #

The project is setup to automatically detect changes to the Semgrep Rules project and incorporate any changes to security rules for the specified languages.

Since Semgrep Rules does not identify versions, Renovate is used to track the latest git ref of the development branch, and it initiates a merge request with any ref changes. The CI pipeline is setup with a trigger job to check for rule updates (if that check has not been done). This job is triggered in another project to isolate the credentials that can push commits back to this project.

The triggered pipeline clones this project, clones the Semgrep Rules project, re-runs the shell script to update rules, and pushes any updates. The commit uses --amend to leave only one commit with an updated message, and is made with Renovate bot account, so Renovate continues to propagate changes to Semgrep Rules if the merge request is not immediately merged. The commit message is also updated to indicate rule changes were incorporated. Since the updated Semgrep rules are pushed, a new pipeline is triggered, and the updated commit message is used in the job rules to stop an infinite loop of triggered rule update pipelines.

Usage in GitLab CI #

The GitLab SAST template is already setup to specify the container image for all SAST jobs via variables, so the implementation in GitLab CI is simply:

include:
  - template: Jobs/SAST.gitlab-ci.yml

semgrep-sast:
  variables:
    SAST_ANALYZER_IMAGE: registry.gitlab.com/gitlab-ci-utils/gitlab-semgrep-plus:latest

With that the Semgrep rules are included in the analysis, and the corresponding report.