Introduction¶
Design goals¶
The fundamental idea is to synchronize portions of two zpools on potentially, but not necessarily, two different physical computers. This may be for backups but also for moving “projects” or virtual machines around between different workstations or servers.
The overall approach should be as robust as possible, allowing for fine-grained tests, easy maintenance and good extensibility. Under no circumstances, even if misconfigured, should the tool allow data loss to occur, ideally making it just as reliable as the ZFS filesystem underneath. It should be possible to mathematically proof the latter.
Two of the requirements that led to abgleich being developed was as-fast-as-possible transfers while also allowing to optimize for the data volume being transferred during backups in low-bandwidth-environments.
By default, abgleich generates a sequence of atomic transactions which are presented to the user for confirmation before acting on them. In this regard, it behaves rather similar to partitioning tools such as parted or GParted.
While the tool can be used on headless servers, it is explicitly also aiming at laptops, desktop computers and workstations running on ZFS. Proper command line and graphical user interfaces for “end users” are therefore required. A single API is driving the different types of user interfaces.
ZFS’ ability to create and clone snapshots allows to mimic workflows similar to what is possible with git. Just like git “commits”, the idea is to only create snapshots if a dataset/volume actually contains changes. In the ZFS world, this is a rather unusual design choice as it is common practise to generate snapshot in fixed time intervals regardless of changes in the dataset/volume. abgleich implements special logic to safely and quickly determine if a dataset/volume contains “uncommitted” changes.
With regards to similarities to git, abgleich aims to provide the ability to synchronize “projects” (datasets/volumes) across multiple computers in addition to providing backup functionality. Projects can be “moved” or “cloned” from one machine to the next, allowing a user to always work on them efficiently and locally. Besides, synchronization operations can happen both in a pull and a push configuration.
Video: Discussion and demo of abgleich (in German)¶
A discussion and “live” demo of abgleich (in German) can be found here.
Alternative ZFS-based Tools¶
The one and only, the classic, the “bash-nightmare”: zfs-auto-snapshot.
Noteworthy alternative sync and backup tools¶
It has been attempted to build similar tools on top of BTRFS, which did not take off as much as initially expected despite being seen as a better or purer version of ZFS, i.e. less licensing issues. Noteworthy examples include btrfs-sxbackup and btrbk.
Besides, there is a rather mature family of similar tools based on rsync, which primarily suffer from the significant performance overhead of having to go through all of the filesystem’s layers where ZFS-based tools in comparison can operate on a rather low block-level.
Another noteworthy example offering similar capabilities but operating on top of the filesystem as well is Borg.
Although git itself is not intended to manage large binary files or simply very high numbers of files at once, it can be extended with plugins such as git-lfs to provide capabilities very similar to abgleich. Unfortunately, any git-based approach will suffer from filesystem-overhead just like rsync, next to other issues. Other approaches based on git have been developed by Microsoft for dealing with very large mono-repos. Examples include VFS for Git (first generation, abandoned), Scalar (second generation, abandoned) and microsoft/git (third and current generation).