# Planet Scheme

## April 25, 2017

### Programming Praxis

#### Etude On A Binary Tree

We have today another in our occasional series of exercises on binary trees; the input tree need not necessarily be ordered or balanced:

Given a binary tree containing integers, find the sum of all nodes at an even distance from the root, less the sum of all nodes at an odd distance from the root.

For instance, given the binary tree shown below, the requested sum is 1 – 2 – 3 + 4 + 5 + 6 + 7 – 8 – 9 – 10 – 11 – 12 – 13 – 14 – 15 = -74:

1
2           3
4     5     6     7
8  9 10 11 12 13 14 15

(I’m not an artist; you’ll have to imagine the lines connecting the various levels.)

Your task is to write a program to compute alternate sums and differences of the nodes of a binary tree. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## April 21, 2017

### Programming Praxis

#### Damsel And Suitor

Chris Smith tweets mathematical curiosities at @aap03102. This one caught my eye the other day:

This is from 1779: a time when puzzles were written in poetry, solutions were assumed to be integers and answers could be a bit creepy:

Questions proposed in 1779, and answered in 1780.

I. QUESTION 742, by Mr. John Penberthy.

I’m in love with a damsel, the pride of the plain,
Have courted and talk’d in Ovidian strain;
But vain is the rhetoric us’d by my tongue,
She says I’m too old and that she is too young:
From the foll’wing equations, dear ladies, unfold;
If she be too young, or if I be too old.

x3 + xy2 = 4640y
x2yy3 = 537.6x

Where x represents my age, and y the damsel’s.

Your task is to compute the ages of the damsel and her suitor. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## April 18, 2017

### Greg Hendershott

#### Racket Makefiles

A few years ago I wrote about makefiles for Racket. Some things have changed.

1. The old makefile built and pushed documentation to a GitHub Pages branch of the repo. That’s no longer necessary: The Racket package catalog builds and hosts documentation.

2. The Racket package catalog puts a yellow badge of shame on packages with missing dependencies (deps and build-deps in the package’s info.rkt). I want the makefile to check this.

3. In .travis.yml files for Travis CI, I think the script section ought to simply invoke targets in the makefile — delegating details to the latter.

4. Likewise some details needn’t even be in the makefile — they can move to the collection’s info.rkt. Example: The list of directories to clean.

5. The old makefile had separate PACKAGENAME and COLLECTS variables; for single-collection packages they were the same value. I wanted to simplify this to just the package name and use the appropriate package variants of raco commands.

In that spirit, here’s an updated Makefile, which I recently started using in the rackjure, markdown, and frog projects.

 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 PACKAGE-NAME=rackjure # Racket 6.1 adds pkg dep checking. ifeq ($(findstring "$(RACKET_VERSION)", "6.0", "6.0.1"),) DEPS-FLAGS=--check-pkg-deps --unused-pkg-deps else DEPS-FLAGS= endif all: setup # Primarily for use by CI. # Installs dependencies as well as linking this as a package. install: raco pkg install --deps search-auto remove: raco pkg remove $(PACKAGE-NAME) # Primarily for day-to-day dev. # Note: Also builds docs (if any) and checks deps. setup: raco setup --tidy$(DEPS-FLAGS) --pkgs $(PACKAGE-NAME) # Note: Each collection's info.rkt can say what to clean, for example # (define clean '("compiled" "doc" "doc/")) to clean # generated docs, too. clean: raco setup --fast-clean --pkgs$(PACKAGE-NAME) # Primarily for use by CI, after make install -- since that already # does the equivalent of make setup, this tries to do as little as # possible except checking deps. check-deps: raco setup --no-docs $(DEPS-FLAGS)$(PACKAGE-NAME) # Suitable for both day-to-day dev and CI test: raco test -x -p $(PACKAGE-NAME) The two main scenarios here: • Day-to-day development: make setup and make test. • CI: make install, make check-deps, and make test. I think you could probably use this as a template for any single-collection package project. Just change PACKAGE-NAME. Possibly append a target or two for something unique to your project. This Makefile is designed to work with Racket 6.0 or newer — because I have some existing packages that support Rackets that old. If you only care about Racket 6.1 or newer, then all this:  1 2 3 4 5 6 # Racket 6.1 adds pkg dep checking. ifeq ($(findstring "$(RACKET_VERSION)", "6.0", "6.0.1"),) DEPS-FLAGS=--check-pkg-deps --unused-pkg-deps else DEPS-FLAGS= endif can become just this:  1 DEPS-FLAGS=--check-pkg-deps --unused-pkg-deps By the way, I wouldn’t call myself a very experienced user of make. For example the way I check for Racket < 6.1 seems smelly — but it was the least-worst way I could figure out. So please feel free to share any suggestions or corrections in the comments. ### Programming Praxis #### Longest Line Today’s exercise is simple: Write a program that prints the longest line in a file. If there is a tie for the longest line, you may print any of the longest lines, or all of them, at your option. There are lots of ways to solve this problem, and I expect that my fun-loving readers will come up with some outlandish solutions, so to make this a sensible exercise we add two rules: First, if you comment you must provide at least two solutions. Second, at least one of your solutions must be “reasonable” in the sense that you would actually use it in a production environment. Your task is to write a program to find the longest line in a file. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below. ## April 14, 2017 ### GNU Guix #### Running system services in containers At FOSDEM, in the awesome Guile track, I briefly demoed a new experimental GuixSD feature as part my talk on system services: the ability to run system services in containers or “sandboxes”. This post discusses the rationale, status, and implementation of this feature. #### The problem Our computers run many programs that talk to the Internet, and the Internet is an unsafe place as we all know—with states and assorted organizations collecting “zero-day exploits” to exploit them as they see fit. One of the big tasks of operating system distributions has been to keep track of known software vulnerabilities and patch their packages as soon as possible. When we look closer, many vulnerabilities out there can be exploited because of a combination of two major weaknesses of GNU/Linux and similar Unix-like operating systems: lack of memory-safety in the C language family, and ambient authority in the operating system itself. The former leads to a huge class of bugs that become security issues: buffer overflows, use-after-free, and so on. The latter makes them more exploitable because processes have access to many resources beyond those they really need. Security-sensitive software is now increasingly written in memory-safe languages, as is the case for Guix and GuixSD. Projects that have been using C are even considering a complete rewrite, as is the case for Tor. Of course the switch away from memory-unsafe languages won’t happen overnight, but it’s good to see a consensus emerging. The operating system side of things is less bright. Although the principle of least authority (POLA) has been well-known in operating system circles for a long time, it remains foreign to Unix and GNU/Linux. Processes run with the full authority of their user. On top of that, until recent changes to the Linux kernel, resources were global and there was essentially a single view of the file system, of the process hierarchy, and so on. So when a remote-code-execution vulnerability affects a system service—like in the BitlBee instant messaging gateway (CVE-2016-10188) running on my laptop—an attacker could potentially do a lot on your machine. Fortunately, many daemons have built-in mechanisms to work around this operating system defect. For instance, BitlBee, and Tor can be told to switch to a separate unprivileged user, avahi-daemon and ntpd can do that and also change root. These techniques do reduce the privileges of those processes, but they are still imperfect and ad hoc. #### Increasing process isolation with containers The optimal solution to this problem would be to honor POLA in the first place. As an example, the venerable GNU/Hurd is a capability-based operating system. Thus, GNU/Hurd has supported fine-grained virtualization from the start: a newly-created process can be given a capability to its own proc server (which implements the POSIX notion of processes), to a specific TCP/IP server, etc. In addition, its POSIX personality offers interesting extensions, such as the fact that processes run with the authority of zero or more UIDs. For instance, the Hurd’s login program starts off with zero UIDs and gains a UID when someone has been authenticated. Back to GNU/Linux, “namespaces” have been introduced as a way to retrofit per-process views of the system resources, and thus improve isolation among processes. Each process can run in a separate namespace and thus have a different view of the file system, process tree, and so on (a process running in separate namespaces is often referred to as a “container”, although that term is sometimes used to denote much larger tooling and practices built around namespaces.) Why not use that to better isolate system services? Apparently this idea has been floating around. systemd has been considering to extend its “unit files” to include directives instructing systemd to run daemons in separate namespaces. GuixSD uses the Shepherd instead of systemd, but running system services in separate namespaces is something we had been considering for a while. In fact, adding the ability to run system services in containers was a low-hanging fruit: we already had call-with-container to run code in containers, so all we needed to do was to provide a containerized service starter that uses call-with-container. The Shepherd itself remains unaware of namespaces, it simply ends up calling make-forkexec-constructor/container instead of make-forkexec-constructor and that’s it. The changes to the service definitions of BitlBee and Tor are minimal. The end result, for Tor, looks like this: (let ((torrc (tor-configuration->torrc config))) (with-imported-modules (source-module-closure &apos((gnu build shepherd) (gnu system file-systems))) (list (shepherd-service (provision &apos(tor)) (requirement &apos(user-processes loopback syslogd)) (modules &apos((gnu build shepherd) (gnu system file-systems))) (start #~(make-forkexec-constructor/container (list #$(file-append tor "/bin/tor") "-f" #$torrc) #:mappings (list (file-system-mapping (source "/var/lib/tor") (target source) (writable? #t)) (file-system-mapping (source "/dev/log") ;for syslog (target source))))) (stop #~(make-kill-destructor)) (documentation "Run the Tor anonymous network overlay."))))) The with-imported-modules form above instructs Guix to import our (gnu build shepherd) library, which provides make-forkexec-constructor/container, into PID 1. The start method of the service specifies the command to start the daemon, as well as file systems to map in its mount name space (“bind mounts”). Here all we need is write access to /var/lib/tor and to /dev/log (for logging via syslogd). In addition to these two mappings, make-forkexec-constructor/container automatically adds /gnu/store and a bunch of files in /etc as we will see below. #### Containerized services in action So what do these containerized services look like when they’re running? When we run herd status bitblee, disappointingly, we don’t see anything special: charlie@guixsd ~$ sudo herd status bitlbee
Status of bitlbee:
It is started.
Running value is 487.
It is enabled.
Provides (bitlbee).
Requires (user-processes networking).
Conflicts with ().
Will be respawned.
charlie@guixsd ~$ps -f 487 UID PID PPID C STIME TTY STAT TIME CMD bitlbee 487 1 0 Apr11 ? Ss 0:00 /gnu/store/pm05bfywrj2k699qbxpjjqfyfk3grz2i-bitlbee-3.5.1/sbin/bitlbee -n -F -u bitlbee -c /gnu/store/y4jfxya56i1hl9z0a2h4hdar2wm Again this is because the Shepherd has no idea what a namespace is, so it just displays the daemon’s PID in the global namespace, 487. The process is running as user bitlbee, as requested by the -u bitlbee command-line option. We can invoke nsenter and take a look at what the BitlBee process “sees” in its namespace: charlie@guixsd ~$ sudo nsenter -t 487 -m -p -i -u $(readlink -f$(type -P bash))
root@guixsd /# echo /*
/dev /etc /gnu /proc /tmp /var
root@guixsd /# echo /proc/[0-9]*
/proc/1 /proc/5
root@guixsd /# read line < /proc/1/cmdline
root@guixsd /# echo $line /gnu/store/pm05bfywrj2k699qbxpjjqfyfk3grz2i-bitlbee-3.5.1/sbin/bitlbee-n-F-ubitlbee-c/gnu/store/y4jfxya56i1hl9z0a2h4hdar2wmivgbl-bitlbee.conf root@guixsd /# echo /etc/* /etc/hosts /etc/nsswitch.conf /etc/passwd /etc/resolv.conf /etc/services root@guixsd /# echo /var/* /var/lib /var/run root@guixsd /# echo /var/lib/* /var/lib/bitlbee root@guixsd /# echo /var/run/* /var/run/bitlbee.pid /var/run/nscd There’s no /home and generally very little in BitlBee’s mount namespace. Notably, the namespace lacks /run/setuid-programs, which is where setuid programs live in GuixSD. Its /etc directory contains the minimal set of files needed for proper operation rather than the complete /etc of the host. /var contains nothing but BitlBee’s own state files, as well as the socket to libc’s name service cache daemon (nscd), which runs in the host system and performs name lookups on behalf of applications. As can be seen in /proc, there’s only a couple of processes in there and “PID 1” in that namespace is the bitlbee daemon. Finally, the /tmp directory is a private tmpfs: root@guixsd /# : > /tmp/hello-bitlbee root@guixsd /# echo /tmp/* /tmp/hello-bitlbee root@guixsd /# exit charlie@guixsd ~$ ls /tmp/*bitlbee
ls: cannot access &apos/tmp/*bitlbee&apos: No such file or directory

Our bitlbee process runs in a separate mount, PID, and IPC namespace, but it runs in the global user namespace. The reason for this is that we want the -u bitlbee option (which instructs bitlbee to setuid to an unprivileged user at startup) to work as expected. It also shares the network namespace because obviously it needs to access the network.

A nice side-effect of these fully-specified execution environments for services is that it makes them more likely to behave in a reproducible fashion across machines—just like fully-specified build environments help achieve reproducible builds.

#### Conclusion

GuixSD master and its upcoming release include this feature and a couple of containerized services, and it works like a charm! Yet, there are still open questions as to the way forward.

First, we only looked at “simple” services so far, with simple static file system mappings. Good candidates for increased isolation are HTTP servers such as NGINX. However, for these, it’s more difficult to determine the set of file system mappings that must be made. GuixSD has the advantage that it knows how NGINX is configured and could potentially derive file system mappings from that information. Getting it right may be trickier than it seems, though, so this is something we’ll have to investigate.

Another open question is how the service isolation work should be split between the distro, the init system, and the upstream service author. Authors of daemons already do part of the work via setuid and sometimes chroot. Going beyond that would often hamper portability (the namespace interface is specific to the kernel Linux) or even functionality if the daemon ends up lacking access to resources it needs.

The init system alone also lacks information to decide what goes into the namespaces of the service. For instance, neither the upstream author nor the init system “knows” whether the distro is running nscd and thus they cannot tell whether the nscd socket should be bind-mounted in the service’s namespace. A similar issue is that of D-Bus policy files discussed in this LWN article. Moving D-Bus functionality into the init system itself to solve this problem, as the article suggests, seems questionable, notably because it would add more code to this critical process. Instead, on GuixSD, a service author can make the right policy files available in the sandbox; in fact, GuixSD already knows which policy files are needed thanks to its service framework so we might even be able to automate it.

At this point it seems that tight integration between the distro and the init system is the best way to precisely define system service sandboxes. GuixSD’s declarative approach to system services along with tight Shepherd integration help a lot here, but it remains to be seen how difficult it is to create sandboxes for complex system services such as NGINX.

#### About GNU Guix

GNU Guix is a transactional package manager for the GNU system. The Guix System Distribution or GuixSD is an advanced distribution of the GNU system that relies on GNU Guix and respects the user&aposs freedom.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. Guix uses low-level mechanisms from the Nix package manager, except that packages are defined as native Guile modules, using extensions to the Scheme language. GuixSD offers a declarative approach to operating system configuration management, and is highly customizable and hackable.

GuixSD can be used on an i686 or x86_64 machine. It is also possible to use Guix on top of an already installed GNU/Linux system, including on mips64el, armv7, and aarch64.

### Programming Praxis

#### Element Words

Today’s exercise is either an interview question or somebody’s homework, I’m not sure:

Given a list of all the symbols of the chemicals in the periodic table, and a list of all the words in the English language, find the longest word that can be made using symbols of the chemicals in the periodic table.

Your task is to write a program to find the longest word that can be formed from the periodic table. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## April 11, 2017

### Programming Praxis

#### Sort Four

Today’s exercise involves sorting:

Write a program that takes exactly four items as input and returns them in sorted order.

Your task is to write a program that sorts exactly four items; can you do it using only five comparisons? When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

## April 07, 2017

### Programming Praxis

#### Edit Files In A REPL

Many programming languages are interactive, with their primary interface a read-eval-print loop. Scheme is like that, so is Python, and so is the primary language I use in my day job, Oracle SQL*Plus, and so too are many other languages.

One feature many of those languages share is a way to edit files from the REPL. For instance, in SQL*Plus, the most recently-entered command is available in a buffer, and if you say “edit” the buffer will be loaded into whatever $EDITOR you specified, so you can modify the command as needed; when the editor returns, the modified buffer is then executed. Although Scheme doesn’t provide such an editing feature, I have been using my own for years. It’s very simple, less than a dozen lines of code, and greatly enhances my productivity. Your task is to write an editing interface for your favorite REPL. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below. ## April 04, 2017 ### Programming Praxis #### Double Space Today’s exercise is simple, but some languages make it hard: Write a program that returns an input string with each space character in the string doubled. For instance, the string “hello hello” (with one space between the two words) is transformed to “hello hello” (with two spaces between the two words). That may not come out right in all browsers; here is the transformation again, with space characters replaced by plus signs for visibility, using a constant-space font: "hello+hello" => "hello++hello" Depending on your language, that might be an easy task, or a hard one. You may have to deal with memory allocation, or non-mutable strings, or appends that blow up to quadratic time. Your task is to write a program that doubles the space characters in its input. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below. ## April 03, 2017 ### Adrien Ramos #### REST APIs with CHICKEN Scheme A few months ago, I stumbled upon the Matrix network and protocol. A pretty interesting chat system with cool features such as distributed chat rooms, chat log replication, VoIP, end to end encryption… A few weeks ago I came back to it and thought “hey! why not write a client in CHICKEN for this network?”, even though I had no previous experience with REST APIs whatsoever. I then took a look a the CHICKEN egg index to see if there was anything to help me do that, and guess what? There are plenty of great stuff for web programming! As Matrix’s network uses HTTP for REST requests, I figured I would use the http-client egg to do all the network stuff for me. It takes care of https handling by transparently using the openssl egg, it automatically reuses connections when possible and it’s super easy to use, as you just have to give it a URL and two procedures that will write and read the request’s and response’s body. Matrix’s data is serialized in the JSON format, for that I used the medea egg: a very cool JSON parser that lets you choose how the JSON should be decoded into Scheme data. The last egg I decided to use is rest-bind. It exports a very neat macro called define-method that defines procedures that will execute HTTP requests for you and return the response. For example, this declaration would define a new procedure versions that returns the scheme representation of the JSON data sent by the server when the https://matrix.org/_matrix/client/versions URL is requested: (define-method (versions) "https://matrix.org/_matrix/client/versions" #f read-json) (versions) ;; -> ((versions . #("r0.0.1" "r0.1.0" "r0.2.0"))) One problem I faced when using this macro, is that the Matrix API wants the client to put a session token called access_token in every request once you’re logged in, as a query parameter. Of course I didn’t want to have to supply that to every procedure call defined with define-method. Hopefully define-method doesn’t use the http-client egg directly! It excepts a procedure called call-with-input-request to be defined. This procedure usually comes from the http-client egg but you can define it yourself if you want to fiddle with the request before handing it to http-client! Of course that’s what I did, I defined my own version of call-with-input-request that adds the access_token query parameter to the request URL from a Scheme parameter. I also took this opportunity to make the server’s URL a Scheme parameter as well, that way the user of the library can have multiple Matrix sessions, to different servers, in their program. This custom procedure also adds a bunch of headers to the HTTP request. Here is how it looks like (where http:call-with-input-request is the actual procedure from http-client): (define (call-with-input-request req writer reader) (unless (server-uri) (error "Server URI not set, use (init!) first")) (let* ((uri-rewritten (update-uri (request-uri req) scheme: (uri-scheme (server-uri)) host: (uri-host (server-uri)) port: (uri-port (server-uri)) query: (append (if (access-token) `((access_token . ,(access-token))) '()) (uri-query (request-uri req))))) (headers-rewritten (headers (cons* '(accept application/json) (if (member (request-method req) '(PUT POST)) '((content-type application/json)) '())) (request-headers req))) (request-rewritten (update-request req uri: uri-rewritten headers: headers-rewritten))) (http:call-with-input-request request-rewritten writer reader))) The last thing I did was to define my own little macro, define-endpoint, because I’m lazy and I hate repeating the same stuff over and over. This macro is a simple pattern-matching syntax-rules macro that expands to define-method. (define-syntax define-endpoint (syntax-rules (GET POST PUT) ((define-endpoint GET decl) (define-method decl api-uri #f read-json)) ((define-endpoint POST decl) (define-method decl api-uri json->string read-json)) ((define-endpoint PUT decl) (define-method decl (make-request uri: api-uri method: 'PUT) json->string read-json)))) With that macro, the only code I have to write to add new API endpoints to my library looks like this: (define-endpoint GET (login-schemes "login")) (define-endpoint POST (login "login")) (define-endpoint POST (logout "logout")) (define-endpoint GET (sync "sync" #!key filter since timeout full_state set_presence timeout)) (define-endpoint PUT (room-send "rooms" room-id "send" event-type transaction-id)) (define-endpoint GET (get-filter "user" user-id "filter" filter-id)) (define-endpoint POST (create-filter "user" user-id "filter")) And I use them as regular Scheme procedures like so: (sync since: "whatever" timeout: 30000) (room-send "!vfFxDRtZSSdspfTSEr:matrix.org" 'm.room.message 'some-transaction-id '((msgtype . "m.text") (body . "Hello world!"))) ## March 31, 2017 ### Programming Praxis #### Generating Random Large Primes Sometimes you need to have a large prime, for testing, cryptography, or some other purpose. I’m talking about primes of several hundred to a few thousand digits that are certified — proven to be prime — rather than probable primes according to a Miller-Rabin or other probabilistic test. Henry Pocklington’s Criterion, which dates to 1914, gives us a way to find such primes quickly. Paulo Ribenboim explains it thus: Let p be an odd prime, and let k be a positive integer, such that p does not divide k and 1 < k < 2(p + 1). Let N = 2kp + 1. Then the following conditions are equivalent: 1. N is a prime. 2. There exists an integer a, 1 < a < N, such that a(N−1)/2 ≡ −1 (mod N) and gcd(ak + 1, N) = 1. This gives us an algorithm for generating large certified primes. Choose p a certified prime. Choose 1 ≤ k < 2 × p at random; we ignore the last two possibilities for k, so we don’t have to worry about k being a multiple of p. Compute N. For each a ∈ {2, 3}, test the conditions for primality. If you don’t find a prime, go back and choose a different random k. Once you have a prime N, “ratchet up” and restart the process with the new certified prime N as the p of the next step. Continue until N is big enough. Your task is to write a program that generates random large primes. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below. ## March 28, 2017 ### Programming Praxis #### RNG147 We have looked at random number generators in several previous exercises, but most of them returned integers. In today’s exercise we look at a simple random number generator that returns floating-point numbers. The generator is simple: Given a seed between zero and one, the next number in the sequence is the fractional portion of 147 times the seed. Your task is to implement the random number generator described above, and to assess its suitability. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below. ## March 24, 2017 ### Programming Praxis #### Symbolic Differentiation One of the original motivations for the Lisp language was to write a program that performed symbolic differentiation. In this exercise we look at the symbolic differentiator in Section 2.3.2 of SICP, which handles expressions containing only addition and multiplication according to the following rules: $\frac{dc}{dx} = 0$, with $c \not= x$, $\frac{dx}{dx} = 1$, $\frac{d(u+v)}{dx} = \frac{du}{dx} + \frac{dv}{dx}$, and $\frac{d(uv)}{dx} = u\frac{dv}{dx} + v\frac{du}{dx}$. Your task is to write a program that performs symbolic differentiation according to the rules given above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below. ## March 21, 2017 ### Programming Praxis #### Word Sets, Solved I completed the solution to the previous exercise, which you can look at if you wish. We’ll have a new exercise on Friday. ## March 20, 2017 ### GNU Guix #### Creating bundles with guix pack Guix just got a new command, dubbed guix pack, which we think many developers will find useful. Last week we were celebrating the release of GNU Guile 2.2.0, the Scheme implementation that powers Guix. This is a major milestone and Guile developers naturally wanted to make it easy for users to discover all the goodies of 2.2.0 as soon as possible. One of the major roadblocks to that, as for any non-trivial piece of software, is deployment: because your distro is unlikely to have Guile 2.2.0 packaged on Day 1, you have to build it by yourself, which means getting the right dependencies installed and then building Guile itself. That’s not difficult for a developer, but it’s certainly cumbersome. Andy Wingo, the driving force behind Guile, thought that it would be nice to propose a binary tarball of Guile 2.2.0 on the day of its release. Guix had already been providing binary tarballs for a couple of years, so why not do the same for Guile? Essentially, the new guix pack command is a generalization of what Guix was already using. #### Making packs So how does it work? The basic idea is simple: you type guix pack guile and the command returns in /gnu/store a good old tarball that contains binaries for Guile and all its dependencies. If you run, say, guix pack guile emacs geiser then you get a complete “Guile SDK” containing Guile, Emacs, Geiser, and all their dependencies. When you extract the tarball, you get a /gnu/store directory with a bunch of sub-directories with these long hashes, one of which is the “profile” containing Guile, Emacs, and Geiser. You wouldn’t want to ask users to type /gnu/store/war325pv1iixj13k6y8yplzagpknfn0c-profile/bin/guile to launch Guile, though. So guix pack has a command-line option to create symlinks in the image. guix pack -S /opt/gnu/bin=bin guile emacs geiser The command above creates a /opt/gnu/bin symlink to the bin directory of the profile in the tarball, such that users can simply type /opt/gnu/bin/guile to run Guile. Recipients of a binary tarball are expected to either extract it in their root file system (yes!) where it will create /gnu and /opt/gnu in this case: # cd / # tar xf /path/to/pack.tar.gz # /opt/gnu/bin/guile --version guile (GNU Guile) 2.2.0 … or they can chroot into it, possibly relying on user namespaces and thereby avoiding root privileges:$ mkdir /tmp/pack
$cd /tmp/pack$ tar xf /path/to/pack.tar.gz
\$ unshare -mrf chroot . /opt/gnu/bin/guile --version
guile (GNU Guile) 2.2.0

The good thing with this is that, because Guix captures the complete dependency graph of packages, the tarball contains everything that’s needed to run Guile and is going to work in exactly the same way on any system that runs the kernel Linux!

#### Bells and whistles

Of course a popular approach to run such “application bundles” is Docker. Since the image format for Docker is documented and fairly easy to produce, we added an option to produce images in this format (Ricardo Wurmus initially contributed Docker support for the low-level guix archive tool but we found that it made more sense to have it in guix pack):

guix pack -f docker -S /opt/gnu=/ guile emacs geiser

The resulting tarball can be passed to docker load, and people can then use docker run to actually run the application.

One of the goodies that comes for free is cross-compilation: Guix supports cross-compilation, so you can create a pack consisting of software cross-compiled for a given platform, specified by the usual GNU triplet. For example, the following command creates a pack with binaries for GNU/Linux on ARMv7:

guix pack --target=arm-linux-gnueabihf guile

… while the command below creates a pack with Windows binaries using the MinGW cross-compiler:

guix pack --target=i686-w64-mingw32 guile

All the package transformation options that Guix supports are available to guix pack. Let’s say you’re a developer of a large piece of software such as a web browser like IceCat and you’d like your users to test whether the current master branch actually fixes the bug you attempted to fix. In this case, you can build a pack of IceCat, but replace the source that’s specified in the distribution with the snapshot of master you’re interested in:

guix pack icecat --with-source=./icecat-48.8.0.master.tar.gz

Of course the resulting pack is going to be pretty big in this case, but I’m sure the general pattern can be useful.

#### Wait, didn’t you say that “app bundles get it wrong”?

It turns out that we Guix developers have been saying that binary “application bundles” à la Docker are problematic for a number of reasons:

1. Composability: each bundle comes with a complete operating system, minus the kernel, and there is little or no sharing happening among bundles, notably in terms of disk space and memory usage.
2. Security updates: since an “app bundle” is essentially a complete operating system, one has to be careful and apply security updates to all the software in each bundle. Unfortunately, that doesn’t always happen as has been famously reported on several occasions.
3. Reproducibility: Docker images, for instance, are often hardly “reproducible” in the sense of a reproducible build process. First, Dockerfiles start out with a “base layer” that is typically a huge binary blob of some major distro. On top of that, they run a number of commands such as apt-get install whose result likely depends on the time at which they are run. Docker’s best practices document suggests ways to mitigate the problem, such as “version pinning”, but the whole approach remains rather brittle.
4. Experimentation: Once you have this big binary blob, sure you can run the application you wanted, but you can do little more than that—you may or may not be able to find the corresponding source code, and you’d have a hard time fiddling with one of the components of the software stack.

We pride ourselves with having a tool set that caters to some of the use cases that “app bundles” and “containerization” try to address while having none of these drawbacks. So how do Guix packs fit into that picture?

First of all, the intended use case is different: we view guix pack as a tool that makes it easy to try out a piece of software on a non-Guix machine. But it is clear that for production, our recommendation is to use Guix directly, to get security updates and generally address all the above issues. :-)

That said, let’s see how these issues affect Guix packs. First, composability of Guix packs turns out to be pretty good. If you receive two different Guix packs for different pieces of software, you can unpack both in your root directory (or union-mount them in the same place): packages that differ have a different /gnu/store file name with a different hash, so they won’t collide; packages that are identical (say the C library or GTK+) will have the same /gnu/store file name so they’ll actually be shared.

That means that for security updates, you could always fetch a new pack of your application with the security updates and extract it in place. However, that requires you as a user to manually pay attention to vulnerabilities in all the software that comes with the pack, so clearly you’re better off using Guix instead and regularly upgrading. No wonders.

Packs themselves are reproducible bit-by-bit. If you know the Guix commit that was used to build a given pack, you can thus run the same guix pack command on another machine and verify that you get the exact same tarball. Currently not 100% of the packages Guix provides are reproducible bit-by-bit; we’re getting closer to that goal though, in part due to the fact that Guix builds are isolated by default, and also thanks to the efforts of everyone in the Reproducible Builds project to address sources of non-determinism in free software.

Because Guix packs are reproducible, you can not only reproduce the exact same pack but also create packs with variants of the software—for instance, changing the version of one of the packages in the stack. Of course this part requires you to have Guix installed somewhere, but at least you can easily fiddle with the software stack and “compile” your own variant of the software stack down to a new pack.

We hope you’ll enjoy packs and Guix, and would welcome your feedback on the guix-devel mailing list and on #guix on Freenode!

#### About GNU Guix

GNU Guix is a transactional package manager for the GNU system. The Guix System Distribution or GuixSD is an advanced distribution of the GNU system that relies on GNU Guix and respects the user&aposs freedom.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. Guix uses low-level mechanisms from the Nix package manager, except that packages are defined as native Guile modules, using extensions to the Scheme language. GuixSD offers a declarative approach to operating system configuration management, and is highly customizable and hackable.

GuixSD can be used on an i686 or x86_64 machine. It is also possible to use Guix on top of an already installed GNU/Linux system, including on mips64el and armv7.