Passing a GitLab CI Job For a Failing Script
There are some cases where the expected outcome of a CI job script is failure. One prominent use case is the testing of tools and container images that are intended for CI-based analyses. This post details techniques for GitLab CI scripts that allow the job to pass when the script fails to accurately reflect the expected result.
CI jobs expecting failure #
GitLab CI considers a job successful if the script
completes with an exit code
of 0, and considers it failed with any other exit code. In most cases that logic
is appropriate, but there are some cases where a job is expected to fail. Two
examples seen frequently revolve around CI pipelines for the testing of testing
tools - integration testing of testing tools, and testing container images
intended for use with testing tools.
To manage these cases, one obvious option for those familiar with GitLab CI is
to use allow_failure
,
which allow failure for any exit code when set to true. This can be enhanced by
using allow_failure
with
exit_codes
,
which accepts an array of exit codes and only those codes allow failure. For
example:
some_job:
allow_failure:
exit_codes:
- 123
- 456
This only allows failure for exit codes 123 and 456, and any others fail the job and the pipeline. While this allows the pipeline to proceed and pass, it still leaves the job, and ultimately the pipeline, passing with failure.
The value of a green pipeline #
Everyone strives to have a green pipeline status with all jobs passed as a definitive indication that all jobs resulted in the expected outcome. In the case where a job passed with failure, people either don't, or certainly don't want to have to, examine the log for a job on every pipeline to see why it failed. It could be the expected result of the job using the techniques previously noted, but could be a variety of other issues - runner failure, failure to load the container image, script execution error, network error, job timeout - anything that can cause the job to fail. Even if the job log is initially examined on each pipeline, it's human nature to normalize that result and stop looking over time. At that point other problems may occur that go unnoticed, possibly until merged or released and result in a bigger issue. The goal should be all jobs passing since any passing with failure case is ultimately an indeterminate result.
Passing a job when the script fails #
The results of a command can be reversed by leveraging the logical And (&&
)
and Or (||
) operators.
- The And (
&&
) operator specifies a command that is executed if preceding command is successful (exit code 0). So in the examplefoo && bar
, thebar
command only executes if thefoo
command passes. - The Or (
||
) operator specifies a command that is executed if preceding command fails (exit code > 0). So in the examplefoo || bar
, thebar
command only execute if thefoo
command fails.
Putting all of that together results in the script
below, which in this case
runs pa11y-ci
, a CI-based web
accessibility testing tool, against a URL known to have accessibility issues
(so, command is expected to fail):
some_job:
script:
- pa11y-ci && exit 1 || exit 0
This runs the pa11y-ci
command. If the command is successful, which in this
case is actually a failure, then it exits 1
and the job fails. If the
pa11y-ci
command fails, which is expected, then exit 0
and the job succeeds.
This is a good solution for cases where the script command only fails with one
specific exit code. If there are multiple possible exit codes, then this could
mask an error in the job. In this particular case, pa11y-ci
exits with 2
if
any accessibility errors are found (what's tested for), but exits with 1
for
other failures. So, if pa11y-ci
fails to execute properly, this job would pass
and mask the failure - not the desired result.
Passing only for certain exit codes #
The preceding technique can be taken a step further by adding logic to
specifically check the exit code from the previous command and take the
appropriate action, as in the following example script
:
some_job:
script:
- pa11y-ci && exit 2 || if [ "$?" -eq "2" ]; then exit 0; else exit 1; fi
This runs the pa11y-ci
command. If it's successful, which again is actually a
failure, then with the &&
operator the script exits 2
and the job fails. In
this case 2
was chosen to indicate a passing script to differentiate it from
other errors.
If the pa11y-ci
command fails, which is expected, then the ||
operator runs
the if
command. The exit code from the previous command is represented by
$?
, so if that is equal to 2, which is the expected result (that is,
pa11y-ci
detected an accessibility error), then exit 0
and the job passes.
Otherwise, there is a different error so exit 1 and the job fails.