Notifications for New Eleventy Posts in GitLab - Part 1

April 8, 2024

One of the challenges with deploying static sites is that there's nothing tracking any sort of site state, including when new content is published. This post presents a technique to identify newly published content on an Eleventy site and sending various notifications with content-specific data. Part 1 covers identifying the new posts and collecting post-specific data.

The example case #

This example is for an Eleventy site derived from the eleventy-base-blog and sends notifications for new published posts. You'll see that limiting to posts is merely filtering down the content and could be easily changed, but in this case it's used to avoid cases like notifications for a new tag added to a post (which could create a new tag-specific page for previously unused tags).

The example also runs in GitLab CI and publishes to GitLab pages without resorting to git diffs, using the GitLab API, or similar state comparisons to determine newly published content. While this specific case is for an Eleventy site hosted on GitLab pages, but the technique should be applicable to any CI system.

At a high level, the overall sequence is:

Determine current posts
Get data for all posts
Determine new posts
Provide notifications for new posts

Determine current posts #

In order to determine new posts, there must be a source of truth for current posts for comparison. The CI pipeline itself doesn't maintain state data for the published site. It could potentially be obtained from the GitLab API if the artifacts have not expired, which may not be a good assumption. There is one straightforward source of truth, though - the sitemap.xml file on the deployed site. This assumes the site is providing a sitemap, which there are many reasons to do, and that the sitemap only contains URLs and is not a sitemap index file (linking to other sitemaps). If needed, there's an example sitemap.xml in the eleventy-base-blog for a reference implementation.

For this case, a job is added to the CI pipeline before the new site is deployed to retrieve the currently deployed site's sitemap.xml and save as an artifact for later use.

# Current sitemap must be retrieved before deploy for comparison.
get_current_sitemap:
  image: alpine:latest
  # With no needs, the job will run at the start of the pipeline
  needs: []
  script:
    - wget -O sitemap.xml https://<site>/sitemap.xml
  rules:
    # Site only deploys on the default branch
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
  artifacts:
    paths:
      - sitemap.xml

This job has rules configured to only run on the default branch since that matches when the site is deployed. In this job wget is used instead of curl simply because it's already installed in the alpine image.

Get data for all posts #

To get the data for all posts, the Eleventy build is updated to create a JSON file with all applicable data, which is then saved and made available for subsequent jobs to use.

Create posts summary JSON file #

To simplify creating the posts summary file, a new Eleventy collection is added to the Eleventy configuration.

// Create collection of posts data for use in external notifications
eleventyConfig.addCollection('postsData', (collection) =>
    collection.getFilteredByTag('posts').map((item) => ({
        date: item.date,
        description: item.data.description,
        inputPath: item.inputPath,
        outputPath: item.outputPath,
        tags: item.data.tags.filter((tag) => tag !== 'posts'),
        title: item.data.title,
        url: item.url
    }))
);

This collection gets data for all posts (any page with the tag posts), and then creates a summary of post-related data to be used for notifications. Collections are processed after all pages and the data cascade, so have access to all page data, the rendered page, input and output file paths, etc. The tags value is updated to remove the posts tag, leaving only other tags. In this cases the inputPath and outputPath are included simply to illustrate that the data is available, and could be included if those files needed to be accessed to get some information (although a goal was to avoid that and use the data that the collection already exposes).

This site uses eleventy-plugin-validate, which ensures that the required post data (title, description, tags) is available and match the required schema or the build fails. Therefore, the postsData collection does not perform duplicate data validation, although that's recommended if the data is not already validated.

In addition to the collection, two filters are added to the Eleventy configuration.

eleventyConfig.addFilter('stringify', (value) => JSON.stringify(value));

const sanitizeTag = (tag) =>
    tag.toLowerCase().replaceAll(/[#/"]/g, '').replaceAll(' ', '-');

eleventyConfig.addFilter('stringifyTags', (tags) =>
    JSON.stringify(tags.map((tag) => `#${sanitizeTag(tag)}`))
);

The stringify filter is used to properly encode some data fields for the JSON file (for example, double quotes that may be in a title or description field, which would otherwise result in invalid JSON).

The post tags is used to create hashtags, so the stringifyTags filter takes the array of post tags, encodes them, and prepends a # to make valid hashtags.

The JSON file itself is created with a new Nunjucks template (Nunjucks is used across this site, this could use any template format).

---
permalink: /posts.json
eleventyExcludeFromCollections: true
---

[
{%- for post in collections.postsData %}
    {
    "url": "{{ post.url | htmlBaseUrl(metadata.url) }}",
    "title": {{ post.title | stringify | safe }},
    "description": {{ post.description | stringify | safe }},
    "date": "{{ post.date | dateToRfc3339 }}",
    "inputPath": "{{ post.inputPath }}",
    "outputPath": "{{ post.outputPath }}",
    "tags": {{ post.tags | stringifyTags | safe }}
    }{% if not loop.last %},{% endif %}
{%- endfor %}
]

This template iterates through the collection that was previously created to generate a JSON array of posts data. There are three noteworthy items in the template:

A permalink is used to set the filename, and the file is excluded from other collections with eleventyExcludeFromCollections. This is similar to how a sitemap or RSS feed would be generated.
The title and description fields call the stringify filter. As was seen previously, this calls JSON.stringify on the value, which ensures it returns a quoted string, so the returned values are not quoted in the template.
All fields that call stringify or stringifyTags for encoding use the safe filter so they're not HTML encoded. The intent is to encode them as valid JSON, not valid HTML.

The Eleventy build saves the posts.json file in the site's outputPath (set previously in the outputPath variable), so an eleventy.after event is used to copy the file to the root directory so it's not deployed with the site (not an issue necessarily, but unnecessary).

eleventyConfig.on('eleventy.after', () => {
    const postsDataFilename = 'posts.json';
    fs.renameSync(path.join(outputPath, postsDataFilename), postsDataFilename);
});

Finally, the pages job artifacts:paths is updated to save the new file in addition to the site directory (in this case public/).

pages:
  image: node:20-alpine
  needs:
    - npm_install
  rules:
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
  script:
    # build runs `npx @11ty/eleventy`
    - npm run build
  artifacts:
    paths:
      - public/
      - posts.json

Note: this project uses the npm_install job to install dependencies once (via npm ci) and save them as artifacts for all subsequent jobs where they are required. The specific template can be found here for reference.

Determine new posts #

New posts are determined by comparing the sitemap and posts summary files.

const fs = require('node:fs');

const sitemapFilename = 'sitemap.xml';
const postsFilename = 'posts.json';
const postsThreshold = 3;

const getNewPosts = () => {
    const sitemap = fs.readFileSync(sitemapFilename, 'utf8');
    const urlRegex = /<loc>(?<url>.+\/posts\/.+?)<\/loc>/g;
    const sitemapUrls = [...sitemap.matchAll(urlRegex)].map((match) => match.groups.url);

    const posts = JSON.parse(fs.readFileSync(postsFilename, 'utf8'));
    if (
        sitemapUrls.length === 0 ||
        posts.length === 0 ||
        Math.abs(sitemapUrls.length - posts.length) > postsThreshold
    ) {
        throw new Error(
            'Error: sitemap and posts data are invalid or out of sync'
        );
    }
    return posts.filter((post) => !sitemapUrls.includes(post.url));
};

The sitemap file is read, and uses a regular expression to find all <url> tags with /posts/ in the URL (rather than parsing the XML and then filtering the results). The URL is stored in the url named capture group. This is implemented on a site with existing posts, so URLs should always be found in the sitemap, and in the posts.json file. As a check for any issues, an error is thrown if either of those lists are empty (a sign of some problem), or different by more than 3 posts. The latter is arbitrary, and could probably be narrowed to 1, but was implemented as another check that there wasn't an issue processing data from either file. In my nominal workflow there should only be one new post in any pipeline. There could also be new tag pages with the blog template, but these are not included in posts. The two files are compared to determine the new posts and an array of posts data for those posts is returned.

Provide notifications for new posts #

A new job new_post_notification was created to run the script that checks for new posts and sends notifications. This is separated to isolate notification errors from pages generation and deploy errors, although these could be combined into one job.

new_post_notification:
  image: node:20-alpine
  # Needs specifies artifacts to download as well as prerequisite jobs
  needs:
    # Provides node_modules folder
    - npm_install
    # Provides previous sitemap.xml
    - get_current_sitemap
    # Provides built site and posts.json
    - pages
  rules:
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
  script:
    - node ./scripts/new-posts.js

As noted, this needs the jobs that previously generated the applicable data and the node_modules folder for modules used by the ./scripts/new-posts.js script.

This site is using Eleventy v2 with CommonJS modules, so the main logic is encapsulated in an IIFE to emulate top level await.

(async () => {
    const posts = getNewPosts();
    if (posts.length === 0) {
        console.log('No new posts to submit');
        return;
    }
    const taskQueue = [];
    for (const post of posts) {
        console.log(`Submitting updates for ${post.url}`);
        taskQueue.push(
            // send updates for new posts
        );
    }

    const results = await Promise.allSettled(taskQueue);
    for (const result of results) {
        if (result.status === 'rejected') {
            console.error(result.reason.message);
        }
    }
})();

It uses the previously discussed getNewPosts function to get the summary data for the new posts, the iterates through those posts to send notifications. All tasks are pushed to an async queue and Promise.allSettled is used to ensure all tasks are executed, even on failure. If any of the tasks fail, an error is logged.

Summary #

This post has detailed how to identify new posts in a Eleventy build, with summary data for those posts. For reference, the complete .gitlab-ci.yml, .eleventy.js, and ./scripts/new-posts.js files (the pieces covered in this post) are included here.

.gitlab-ci.yml

npm_install:
  image: node:20-alpine
  needs: []
  script:
    - npm ci
  artifacts:
    paths:
      - node_modules/

# Current sitemap must be retrieved before deploy for comparison.
get_current_sitemap:
  image: alpine:latest
  # With no needs, the job will run at the start of the pipeline
  needs: []
  script:
    - wget -O sitemap.xml https://<site>/sitemap.xml
  rules:
    # Site only deploys on the default branch
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
  artifacts:
    paths:
      - sitemap.xml

pages:
  image: node:20-alpine
  needs:
    - npm_install
  rules:
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
  script:
    # build runs `npx @11ty/eleventy`
    - npm run build
  artifacts:
    paths:
      - public/
      - posts.json

new_post_notification:
  image: node:20-alpine
  # Needs specifies artifacts to download as well as prerequisite jobs
  needs:
    # Provides node_modules folder
    - npm_install
    # Provides previous sitemap.xml
    - get_current_sitemap
    # Provides built site and posts.json
    - pages
  rules:
    - if: $CI_COMMIT_BRANCH == $CI_DEFAULT_BRANCH
  script:
    - node ./scripts/new-posts.js

.eleventy.js

'use strict';

const fs = require('node:fs');
const path = require('node:path');

// Global paths
const inputPath = 'src';
const outputPath = 'public';

const sanitizeTag = (tag) =>
    tag.toLowerCase().replaceAll(/[#/]/g, '').replaceAll(' ', '-');

module.exports = function (eleventyConfig) {

    // other configuration

    eleventyConfig.addFilter('stringify', (value) => JSON.stringify(value));

    eleventyConfig.addFilter('stringifyTags', (tags) =>
        JSON.stringify(tags.map((tag) => `#${sanitizeTag(tag)}`))
    );

 // Create collection of posts data for use in external notifications
 eleventyConfig.addCollection('postsData', (collection) =>
     collection.getFilteredByTag('posts').map((item) => ({
         date: item.date,
         description: item.data.description,
         inputPath: item.inputPath,
         outputPath: item.outputPath,
         tags: item.data.tags.filter((tag) => tag !== 'posts'),
         title: item.data.title,
         url: item.url
     }))
 );

    // Move the posts.json file to the root folder since not deployed
 eleventyConfig.on('eleventy.after', () => {
        const postsDataFilename = 'posts.json';
        fs.renameSync(path.join(outputPath, postsDataFilename), postsDataFilename); });

    return {
        dir: {
            input: inputPath,
            output: outputPath
        },
        // other configuration
    };
};

./scripts/new-posts.js

'use strict';

const fs = require('node:fs');

const sitemapFilename = 'sitemap.xml';
const postsFilename = 'posts.json';
const postsThreshold = 3;

const getNewPosts = () => {
    const sitemap = fs.readFileSync(sitemapFilename, 'utf8');
    const urlRegex = /<loc>(?<url>.+\/posts\/.+?)<\/loc>/g;
    const sitemapUrls = [...sitemap.matchAll(urlRegex)].map((match) => match.groups.url);

    const posts = JSON.parse(fs.readFileSync(postsFilename, 'utf8'));
    if (
        sitemapUrls.length === 0 ||
        posts.length === 0 ||
        Math.abs(sitemapUrls.length - posts.length) > postsThreshold
    ) {
        throw new Error(
            'Error: sitemap and posts data are invalid or out of sync'
        );
    }
    return posts.filter((post) => !sitemapUrls.includes(post.url));
};

(async () => {
    const posts = getNewPosts();
    if (posts.length === 0) {
        console.log('No new posts to submit');
        return;
    }
    const taskQueue = [];
    for (const post of posts) {
        console.log(`Submitting updates for ${post.url}`);
        taskQueue.push(
            // send updates for new posts
        );
    }

    const results = await Promise.allSettled(taskQueue);
    for (const result of results) {
        if (result.status === 'rejected') {
            console.error(result.reason.message);
        }
    }
})();

Details for the currently implemented notifications will be discussed in Part 2:

Posting a status to Mastodon
Posting a status to Bluesky
Sending an IndexNow notification