Passing a GitLab CI Job For a Failing Script
There are some cases where the expected outcome of a CI job script is failure. One prominent use case is the testing of tools and container images that are intended for CI-based analyses. This post details techniques for GitLab CI scripts that will allow the job to pass when the script fails to accurately reflect the expected result.
CI Jobs Expecting Failure #
GitLab CI considers a job successful if the script
completes with an exit code
of 0, and considers it failed with any other exit code. In most cases that logic
is appropriate, but there are some cases where a job is expected to fail. Two
examples I see frequently revolve around CI pipelines for the testing of testing
tools - integration testing of testing tools, and testing container images
intended for use with testing tools.
To manage these cases, one obvious option for those familiar with GitLab CI is
to use allow_failure
,
which will allow failure for any exit code when set to true. This can be
enhanced by using allow_failure
with
exit_codes
,
which accepts an array of exit codes and only those codes will allow failure.
For example:
some_job:
allow_failure:
exit_codes:
- 123
- 456
This will only allow failure for exit codes 123 and 456, and any others will fail the job and the pipeline. While this will allow the pipeline to proceed and pass, it still leaves the job, and ultimately the pipeline, passing with failure.
The Value of a Green Pipeline #
Everyone strives to have a green pipeline status with all jobs passed as a definitive indication that all jobs resulted in the expected outcome. In the case where a job passed with failure, people either don't, or certainly don't want to have to, examine the log for a job on every pipeline to see why it failed. It could be the expected result of the job using the techniques noted above, but could be a variety of other issues - runner failure, failure to load the container image, script execution error, network error, job timeout - anything that can cause the job to fail. Even if the job log is initially examined on each pipeline, it's human nature to normalize that result and stop looking over time. At that point other problems may occur that go unnoticed, possibly until merged or released and result in a bigger issue. The goal should be all jobs passing since any passing with failure case is ultimately an indeterminate result.
Passing a Job When the Script Fails #
The results of a command can be reversed by leveraging the logical And (&&
)
and Or (||
) operators.
- The And (
&&
) operator specifies a command that will be executed if preceding command is successful (exit code 0). So in the examplefoo && bar
, thebar
command will only execute if thefoo
command passes. - The Or (
||
) operator specifies a command that will be executed if preceding command fails (exit code > 0). So in the examplefoo || bar
, thebar
command will only execute if thefoo
command fails.
Putting all of that together results in the script
below, which in this case
runs pa11y-ci
, a CI-based web
accessibility testing tool, against a URL known to have accessibility issues
(so, command is expected to fail):
some_job:
script:
- pa11y-ci && exit 1 || exit 0
This runs the the pa11y-ci
command. If the command is successful, which in
this case is actually a failure, then it will exit 1
and the job fails. If the
pa11y-ci
command fails, which is expected, then exit 0
and the job succeeds.
This is a good solution for cases where the script command will only fail with
one specific exit code. If there are multiple possible exit codes, then this
could mask an error in the job. In this particular case, pa11y-ci
exits with
2
if any accessibility errors are found (what we're testing for), but will
exit with 1
for other failures. So, if pa11y-ci
fails to execute properly,
this job would pass and mask the failure - not the desired result.
Passing Only For Certain Exit Codes #
The technique above can be taken a step further by adding logic to specifically
check the exit code from the previous command and take the appropriate action,
as in the following example script
:
some_job:
script:
- pa11y-ci && exit 2 || if [ "$?" -eq "2" ]; then exit 0; else exit 1; fi
This runs the the pa11y-ci
command. If it's successful, which again is
actually a failure, then with the &&
operator the script will exit 2
and the
job will fail. In this case 2
was chosen to indicate a passing script to
differentiate it from other errors.
If the pa11y-ci
command fails, which is expected, then the ||
operator runs
the if
command. The exit code from the previous command is represented by
$?
, so if that is equal to 2, which is our expected result (i.e. pa11y-ci
detected an accessibility error), then exit 0
and the job passes. Otherwise,
there is a different error so exit 1 and the job fails.