Passing a GitLab CI Job For a Failing Script

February 21, 2023

There are some cases where the expected outcome of a CI job script is failure. One prominent use case is the testing of tools and container images that are intended for CI-based analyses. This post details techniques for GitLab CI scripts that allow the job to pass when the script fails to accurately reflect the expected result.

CI jobs expecting failure #

GitLab CI considers a job successful if the script completes with an exit code of 0, and considers it failed with any other exit code. In most cases that logic is appropriate, but there are some cases where a job is expected to fail. Two examples seen frequently revolve around CI pipelines for the testing of testing tools - integration testing of testing tools, and testing container images intended for use with testing tools.

To manage these cases, one obvious option for those familiar with GitLab CI is to use allow_failure, which allow failure for any exit code when set to true. This can be enhanced by using allow_failure with exit_codes, which accepts an array of exit codes and only those codes allow failure. For example:

some_job:
  allow_failure:
    exit_codes:
      - 123
      - 456

This only allows failure for exit codes 123 and 456, and any others fail the job and the pipeline. While this allows the pipeline to proceed and pass, it still leaves the job, and ultimately the pipeline, passing with failure.

The value of a green pipeline #

Everyone strives to have a green pipeline status with all jobs passed as a definitive indication that all jobs resulted in the expected outcome. In the case where a job passed with failure, people either don't, or certainly don't want to have to, examine the log for a job on every pipeline to see why it failed. It could be the expected result of the job using the techniques previously noted, but could be a variety of other issues - runner failure, failure to load the container image, script execution error, network error, job timeout - anything that can cause the job to fail. Even if the job log is initially examined on each pipeline, it's human nature to normalize that result and stop looking over time. At that point other problems may occur that go unnoticed, possibly until merged or released and result in a bigger issue. The goal should be all jobs passing since any passing with failure case is ultimately an indeterminate result.

Passing a job when the script fails #

The results of a command can be reversed by leveraging the logical And (&&) and Or (||) operators.

The And (&&) operator specifies a command that is executed if preceding command is successful (exit code 0). So in the example foo && bar, the bar command only executes if the foo command passes.
The Or (||) operator specifies a command that is executed if preceding command fails (exit code > 0). So in the example foo || bar, the bar command only execute if the foo command fails.

Putting all of that together results in the script below, which in this case runs pa11y-ci, a CI-based web accessibility testing tool, against a URL known to have accessibility issues (so, command is expected to fail):

some_job:
  script:
    - pa11y-ci && exit 1 || exit 0

This runs the pa11y-ci command. If the command is successful, which in this case is actually a failure, then it exits 1 and the job fails. If the pa11y-ci command fails, which is expected, then exit 0 and the job succeeds.

This is a good solution for cases where the script command only fails with one specific exit code. If there are multiple possible exit codes, then this could mask an error in the job. In this particular case, pa11y-ci exits with 2 if any accessibility errors are found (what's tested for), but exits with 1 for other failures. So, if pa11y-ci fails to execute properly, this job would pass and mask the failure - not the desired result.

Passing only for certain exit codes #

The preceding technique can be taken a step further by adding logic to specifically check the exit code from the previous command and take the appropriate action, as in the following example script:

some_job:
  script:
    - pa11y-ci && exit 2 || if [ "$?" -eq "2" ]; then exit 0; else exit 1; fi

This runs the pa11y-ci command. If it's successful, which again is actually a failure, then with the && operator the script exits 2 and the job fails. In this case 2 was chosen to indicate a passing script to differentiate it from other errors.

If the pa11y-ci command fails, which is expected, then the || operator runs the if command. The exit code from the previous command is represented by $?, so if that is equal to 2, which is the expected result (that is, pa11y-ci detected an accessibility error), then exit 0 and the job passes. Otherwise, there is a different error so exit 1 and the job fails.