For a long time, our build farm at ci.guix.gnu.org has been delivering
(pre-built binaries) compressed with gzip. Gzip was never the best
choice in terms of compression ratio, but it was a reasonable and
convenient choice: it’s rock-solid, and zlib made it easy for us to have
to perform in-process compression in our multi-threaded
With the exception of building software from source, downloads take the most time of Guix package upgrades. If users can download less, upgrades become faster, and happiness ensues. Time has come to improve on this, and starting from early June, Guix can publish and fetch lzip-compressed substitutes, in addition to gzip.
Lzip is a relatively little-known compression format, initially developed by Antonio Diaz Diaz ca. 2013. It has several C and C++ implementations with surprisingly few lines of code, which is always reassuring. One of its distinguishing features is a very good compression ratio with reasonable CPU and memory requirements, according to benchmarks published by the authors.
With this in place we were ready to start migrating our tools, and then our build farm, to lzip compression, so we can all enjoy smaller downloads. Well, easier said than done!
The compression format used for substitutes is not a core component like
it can be in “traditional” binary package formats such as
.deb since Guix is conceptually a
“source-based” distro. However, deployed Guix installations did not
support lzip, so we couldn’t just switch our build farm to lzip
overnight; we needed to devise a transition strategy.
Guix asks for the availability of substitutes over HTTP. For example, a question such as:
“Dear server, do you happen to have a binary of
/gnu/store/6yc4ngrsig781bpayax2cg6pncyhkjpq-emacs-26.2that I could download?”
translates into prose to an HTTP GET of https://ci.guix.gnu.org/6yc4ngrsig781bpayax2cg6pncyhkjpq.narinfo, which returns something like:
StorePath: /gnu/store/6yc4ngrsig781bpayax2cg6pncyhkjpq-emacs-26.2 URL: nar/gzip/6yc4ngrsig781bpayax2cg6pncyhkjpq-emacs-26.2 Compression: gzip NarHash: sha256:0h2ibqpqyi3z0h16pf7ii6l4v7i2wmvbrxj4ilig0v9m469f6pm9 NarSize: 134407424 References: 2dk55i5wdhcbh2z8hhn3r55x4873iyp1-libxext-1.3.3 … FileSize: 48501141 System: x86_64-linux Deriver: 6xqibvc4v8cfppa28pgxh0acw9j8xzhz-emacs-26.2.drv Signature: 1;berlin.guixsd.org;KHNpZ25hdHV…
(This narinfo format is inherited from Nix and
This tells us we can download the actual binary from
/nar/gzip/…-emacs-26.2, and that it will be about 46 MiB (the
FileSize field.) This is what
guix publish serves.
The trick we came up with was to allow
guix publish to advertise
several URLs, one per compression format. Thus, for recently-built
substitutes, we get something like
StorePath: /gnu/store/mvhaar2iflscidl0a66x5009r44fss15-gimp-2.10.12 URL: nar/gzip/mvhaar2iflscidl0a66x5009r44fss15-gimp-2.10.12 Compression: gzip FileSize: 30872887 URL: nar/lzip/mvhaar2iflscidl0a66x5009r44fss15-gimp-2.10.12 Compression: lzip FileSize: 18829088 NarHash: sha256:10n3nv3clxr00c9cnpv6x7y2c66034y45c788syjl8m6ga0hbkwy NarSize: 94372664 References: 05zlxc7ckwflz56i6hmlngr86pmccam2-pcre-8.42 … System: x86_64-linux Deriver: vi2jkpm9fd043hm0839ibbb42qrv5xyr-gimp-2.10.12.drv Signature: 1;berlin.guixsd.org;KHNpZ25hdHV…
Notice that there are two occurrences of the
FileSize fields: one for gzip, and one for lzip. Old Guix instances
will just pick the first one, gzip; newer Guix will pick whichever
supported method provides the smallest
FileSize, usually lzip. This
will make migration trivial in the future, should we add support for
other compression methods.
Users need to upgrade their Guix daemon to benefit from lzip. On a
“foreign distro”, simply run
guix pull as root. On standalone Guix
guix pull && sudo guix system reconfigure /etc/config.scm. In both cases, the daemon has to be restarted, be it
systemctl restart guix-daemon.service or with
herd restart guix-daemon.
This new gzip+lzip scheme has been deployed on ci.guix.gnu.org for a
week. Specifically, we run
guix publish -C gzip:9 -C lzip:9, meaning
that we use the highest compression ratio for both compression methods.
Currently, only a small subset of the package substitutes are available
as both lzip and gzip; those that were already available as gzip have
not been recompressed. The following Guile program that taps into the
allows us to get some insight:
(use-modules (gnu) (guix) (guix monads) (guix scripts substitute) (srfi srfi-1) (ice-9 match)) (define all-packages (@@ (guix scripts weather) all-packages)) (define package-outputs (@@ (guix scripts weather) package-outputs)) (define (fetch-lzip-narinfos) (mlet %store-monad ((items (package-outputs (all-packages)))) (return (filter (lambda (narinfo) (member "lzip" (narinfo-compressions narinfo))) (lookup-narinfos "https://ci.guix.gnu.org" items))))) (define (lzip/gzip-ratio narinfo) (match (narinfo-file-sizes narinfo) ((gzip lzip) (/ lzip gzip)))) (define (average lst) (/ (reduce + 0 lst) (length lst) 1.))
Let’s explore this at the REPL:
scheme@(guile-user)> (define lst (with-store s (run-with-store s (fetch-lzip-narinfos)))) computing 9,897 package derivations for x86_64-linux... updating substitutes from 'https://ci.guix.gnu.org'... 100.0% scheme@(guile-user)> (length lst) $4 = 2275 scheme@(guile-user)> (average (map lzip/gzip-ratio lst)) $5 = 0.7398994395478715
As of this writing, around 20% of the package substitutes are available as lzip, so take the following stats with a grain of salt. Among those, the lzip-compressed substitute is on average 26% smaller than the gzip-compressed one. What if we consider only packages bigger than 5 MiB uncompressed?
scheme@(guile-user)> (define biggest (filter (lambda (narinfo) (> (narinfo-size narinfo) (* 5 (expt 2 20)))) lst)) scheme@(guile-user)> (average (map lzip/gzip-ratio biggest)) $6 = 0.5974238562384483 scheme@(guile-user)> (length biggest) $7 = 440
For those packages, lzip yields substitutes that are 40% smaller on average. Pretty nice! Lzip decompression is slightly more CPU-intensive than gzip decompression, but downloads are bandwidth-bound, so the benefits clearly outweigh the costs.
The switch from gzip to lzip has the potential to make upgrades “feel” faster, and that is great in itself.
Fundamentally though, we’ve always been looking in this project at peer-to-peer solutions with envy. Of course, the main motivation is to have a community-supported and resilient infrastructure, rather than a centralized one, and that vision goes hand-in-hand with reproducible builds.
We started working on an extension to publish and fetch substitutes over IPFS. Thanks to its content-addressed nature, IPFS has the potential to further reduce the amount of data that needs to be downloaded on an upgrade.
The good news is that IPFS developers are also interested in working with package manager developers, and I bet there’ll be interesting discussions at IPFS Camp in just a few days. We’re eager to pursue our IPFS integration work, and if you’d like to join us and hack the good hack, let’s get in touch!
About GNU Guix
GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the kernel Linux, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, and AArch64 machines.
In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.