this project needs a name and a domain.

After many years of playing with YAML in k8s, I’ve returned to using CFEngine (at work). The last config management software before that was Puppet. An issue I see with all modern config management tooling is the lack of monitoring and the impossibility to cleanly roll back. In k8s it works better, but that’s only for pods running in k8s (which are stateless), not the underlying machines.

I want something that:

  • works on a plain Debian machine
  • has sane monitoring
  • support rollbacks
  • support canarying
  • (is written in Go)

What If?

I want to focus on syncing a directory with (config) files from a central repository to a machine, as this seems complex enough already. You can do this with gitopper, which basically is ‘forking out to Git’ to solve things. But this doesn’t work for something like adding users.

So maybe I want something like the following:

I have some higher level language that defines a how to configure something, this compiles down to a lower level language with a bunch of atomic steps, i.e. for creating a directory and copying files it would be:

MKDIR /tmp/blaat
COPY  file1 /tmp/blaat/file1
COPY  file2 /tmp/blaat/file2

To facilitate rollback we can just generate the atomic steps again for a different git hash:

MKDIR /tmp/blaat
COPY  file1 /tmp/blaat/file1
COPY  file2 /tmp/blaat/file2
COPY  file3 /tmp/blaat/file3

The diff between the two lists is ‘+COPY file3 /tmp/blaat/file3’, so it will just do that. (Checking that file1 and file2 have not changed should also be done - maybe we can model that as well in this “atomic” language as well.)

Next run we get:

MKDIR /tmp/blaat
COPY  file2 /tmp/blaat/file2
COPY  file3 /tmp/blaat/file3

This diff is ‘- COPY file1 /tmp/blaat/file1’ and the reverse for COPY is defined as RM, so this should execute ‘RM /tmp/blaat/file1’.

If we want to rollback we go back in (our local?) git repo, generate the list from there and check what we had when we were on the current commit, we generate the diff and execute all the reverse operation as we traverse upwards. I.e. say our list does not have the MKDIR and COPYs we’ve used here, our imaginary system will execute, the opposite of the following the the reverse order:

MKDIR /tmp/blaat
COPY  file1 /tmp/blaat/file1
COPY  file2 /tmp/blaat/file2
COPY  file3 /tmp/blaat/file3

will become

RM  /tmp/blaat/file3
RM  /tmp/blaat/file2
RM  /tmp/blaat/file1
RMDIR /tmp/blaat

and we’re back to whatever commit we need to be on. Note that reverting back to a previous file’s state means we would need to rollback to commit X and then execute all steps up to commit X again (to restore whatever the previous state was).

In effect the atomic file we’re generating is a journal file, like those used in file systems.

Every machine should be on the latest Git hash, so drift detection is easy, it can also report which ‘hash’ it’s currently applying; that’s most of the monitoring it would want. Errors and “I didn’t except this state” can also be flagged. We can also check the generated atomic file: files should only be copied once, directories should exist before copying into them.

Canarying is just having a subset of machine use a different branch, which is also something that will help in debugging, as you can make some machine use your branch, before merging to main and rolling out to the entire fleet.

Language

Should this have a higher level language? Probably, but which one, HCL, Nix? Not something like YAML. Our atomic file may benefit from includes to other steps from another file. This would allow a plugin like system where others (maybe even other implementations) also generate some steps that should be executed.

Ordering

Of course when putting files in a directory that directory needs to exist, so there is a inherent order present. The order of the atomic step is how that file is processed, if something depends on something else, the higher level language needs to make that connection and output things in the correct order.

Debugging

It would also be nice to know that part of code is responsible for which line (or lines) the atomic steps file.

Next Steps

I’ll probably play with, defining, and creating the low-level atomic language and seeing if that diffing makes any sense. This will work out of a local git repo - i.e. every machine participating in this needs a local git checkout of the config repo. The low-level file format will be plain text (was pondering binary, but who cares at this point).