Planet Scheme

June 26, 2019

Andy Wingo

fibs, lies, and benchmarks

Friends, consider the recursive Fibonacci function, expressed most lovelily in Haskell:

fib 0 = 0
fib 1 = 1
fib n = fib (n-1) + fib (n-2)

Computing elements of the Fibonacci sequence ("Fibonacci numbers") is a common microbenchmark. Microbenchmarks are like a Suzuki exercises for learning violin: not written to be good tunes (good programs), but rather to help you improve a skill.

The fib microbenchmark teaches language implementors to improve recursive function call performance.

I'm writing this article because after adding native code generation to Guile, I wanted to check how Guile was doing relative to other language implementations. The results are mixed. We can start with the most favorable of the comparisons: Guile present versus Guile of the past.


I collected these numbers on my i7-7500U CPU @ 2.70GHz 2-core laptop, with no particular performance tuning, running each benchmark 10 times, waiting 2 seconds between measurements. The bar value indicates the median elapsed time, and above each bar is an overlayed histogram of all results for that scenario. Note that the y axis is on a log scale. The 2.9.3* version corresponds to unreleased Guile from git.

Good news: Guile has been getting significantly faster over time! Over decades, true, but I'm pleased.

where are we? static edition

How good are Guile's numbers on an absolute level? It's hard to say because there's no absolute performance oracle out there. However there are relative performance oracles, so we can try out perhaps some other language implementations.

First up would be the industrial C compilers, GCC and LLVM. We can throw in a few more "static" language implementations as well: compilers that completely translate to machine code ahead-of-time, with no type feedback, and a minimal run-time.


Here we see that GCC is doing best on this benchmark, completing in an impressive 0.304 seconds. It's interesting that the result differs so much from clang. I had a look at the disassembly for GCC and I see:

fib:
    push   %r12
    mov    %rdi,%rax
    push   %rbp
    mov    %rdi,%rbp
    push   %rbx
    cmp    $0x1,%rdi
    jle    finish
    mov    %rdi,%rbx
    xor    %r12d,%r12d
again:
    lea    -0x1(%rbx),%rdi
    sub    $0x2,%rbx
    callq  fib
    add    %rax,%r12
    cmp    $0x1,%rbx
    jg     again
    and    $0x1,%ebp
    lea    0x0(%rbp,%r12,1),%rax
finish:
    pop    %rbx
    pop    %rbp
    pop    %r12
    retq   

It's not quite straightforward; what's the loop there for? It turns out that GCC inlines one of the recursive calls to fib. The microbenchmark is no longer measuring call performance, because GCC managed to reduce the number of calls. If I had to guess, I would say this optimization doesn't have a wide applicability and is just to game benchmarks. In that case, well played, GCC, well played.

LLVM's compiler (clang) looks more like what we'd expect:

fib:
   push   %r14
   push   %rbx
   push   %rax
   mov    %rdi,%rbx
   cmp    $0x2,%rdi
   jge    recurse
   mov    %rbx,%rax
   add    $0x8,%rsp
   pop    %rbx
   pop    %r14
   retq   
recurse:
   lea    -0x1(%rbx),%rdi
   callq  fib
   mov    %rax,%r14
   add    $0xfffffffffffffffe,%rbx
   mov    %rbx,%rdi
   callq  fib
   add    %r14,%rax
   add    $0x8,%rsp
   pop    %rbx
   pop    %r14
   retq   

I bolded the two recursive calls.

Incidentally, the fib as implemented by GCC and LLVM isn't quite the same program as Guile's version. If the result gets too big, GCC and LLVM will overflow, whereas in Guile we overflow into a bignum. Also in C, it's possible to "smash the stack" if you recurse too much; compilers and run-times attempt to mitigate this danger but it's not completely gone. In Guile you can recurse however much you want. Finally in Guile you can interrupt the process if you like; the compiled code is instrumented with safe-points that can be used to run profiling hooks, debugging, and so on. Needless to say, this is not part of C's mission.

Some of these additional features can be implemented with no significant performance cost (e.g., via guard pages). But it's fair to expect that they have some amount of overhead. More on that later.

The other compilers are OCaml's ocamlopt, coming in with a very respectable result; Go, also doing well; and V8 WebAssembly via Node. As you know, you can compile C to WebAssembly, and then V8 will compile that to machine code. In practice it's just as static as any other compiler, but the generated assembly is a bit more involved:

fib_tramp:
    jmp    fib

fib:
    push   %rbp
    mov    %rsp,%rbp
    pushq  $0xa
    push   %rsi
    sub    $0x10,%rsp
    mov    %rsi,%rbx
    mov    0x2f(%rbx),%rdx
    mov    %rax,-0x18(%rbp)
    cmp    %rsp,(%rdx)
    jae    stack_check
post_stack_check:
    cmp    $0x2,%eax
    jl     return_n
    lea    -0x2(%rax),%edx
    mov    %rbx,%rsi
    mov    %rax,%r10
    mov    %rdx,%rax
    mov    %r10,%rdx
    callq  fib_tramp
    mov    -0x18(%rbp),%rbx
    sub    $0x1,%ebx
    mov    %rax,-0x20(%rbp)
    mov    -0x10(%rbp),%rsi
    mov    %rax,%r10
    mov    %rbx,%rax
    mov    %r10,%rbx
    callq  fib_tramp
return:
    mov    -0x20(%rbp),%rbx
    add    %ebx,%eax
    mov    %rbp,%rsp
    pop    %rbp
    retq   
return_n:
    jmp    return
stack_check:
    callq  WasmStackGuard
    mov    -0x10(%rbp),%rbx
    mov    -0x18(%rbp),%rax
    jmp    post_stack_check

Apparently fib compiles to a function of two arguments, the first passed in rsi, and the second in rax. (V8 uses a custom calling convention for its compiled WebAssembly.) The first synthesized argument is a handle onto run-time data structures for the current thread or isolate, and in the function prelude there's a check to see that the function has enough stack. V8 uses these stack checks also to handle interrupts, for when a web page is stuck in JavaScript.

Otherwise, it's a more or less normal function, with a bit more register/stack traffic than would be strictly needed, but pretty good.

do optimizations matter?

You've heard of Moore's Law -- though it doesn't apply any more, it roughly translated into hardware doubling in speed every 18 months. (Yes, I know it wasn't precisely that.) There is a corresponding rule of thumb for compiler land, Proebsting's Law: compiler optimizations make software twice as fast every 18 years. Zow!

The previous results with GCC and LLVM were with optimizations enabled (-O3). One way to measure Proebsting's Law would be to compare the results with -O0. Obviously in this case the program is small and we aren't expecting much work out of the optimizer, but it's interesting to see anyway:


Answer: optimizations don't matter much for this benchark. This investigation does give a good baseline for compilers from high-level languages, like Guile: in the absence of clever trickery like the recursive inlining thing GCC does and in the absence of industrial-strength instruction selection, what's a good baseline target for a compiler? Here we see for this benchmark that it's somewhere between 420 and 620 milliseconds or so. Go gets there, and OCaml does even better.

how is time being spent, anyway?

Might we expect V8/WebAssembly to get there soon enough, or is the stack check that costly? How much time does one stack check take anyway? For that we'd have to determine the number of recursive calls for a given invocation.

Friends, it's not entirely clear to me why this is, but I instrumented a copy of fib, and I found that the number of calls in fib(n) was a more or less constant factor of the result of calling fib. That ratio converges to twice the golden ratio, which means that since fib(n+1) ~= φ * fib(n), then the number of calls in fib(n) is approximately 2 * fib(n+1). I scratched my head for a bit as to why this is and I gave up; the Lord works in mysterious ways.

Anyway for fib(40), that means that there are around 3.31e8 calls, absent GCC shenanigans. So that would indicate that each call for clang takes around 1.27 ns, which at turbo-boost speeds on this machine is 4.44 cycles. At maximum throughput (4 IPC), that would indicate 17.8 instructions per call, and indeed on the n > 2 path I count 17 instructions.

For WebAssembly I calculate 2.25 nanoseconds per call, or 7.9 cycles, or 31.5 (fused) instructions at max IPC. And indeed counting the extra jumps in the trampoline, I get 33 cycles on the recursive path. I count 4 instructions for the stack check itself, one to save the current isolate, and two to shuffle the current isolate into place for the recursive calls. But, compared to clang, V8 puts 6 words on the stack per call, as opposed to only 4 for LLVM. I think with better interprocedural register allocation for the isolate (i.e.: reserve a register for it), V8 could get a nice boost for call-heavy workloads.

where are we? dynamic edition

Guile doesn't aim to replace C; it's different. It has garbage collection, an integrated debugger, and a compiler that's available at run-time, it is dynamically typed. It's perhaps more fair to compare to languages that have some of these characteristics, so I ran these tests on versions of recursive fib written in a number of languages. Note that all of the numbers in this post include start-up time.


Here, the ocamlc line is the same as before, but using the bytecode compiler instead of the native compiler. It's a bit of an odd thing to include but it performs so well I just had to include it.

I think the real takeaway here is that Chez Scheme has fantastic performance. I have not been able to see the disassembly -- does it do the trick like GCC does? -- but the numbers are great, and I can see why Racket decided to rebase its implementation on top of it.

Interestingly, as far as I understand, Chez implements stack checks in the straightfoward way (an inline test-and-branch), not with a guard page, and instead of using the stack check as a generic ability to interrupt a computation in a timely manner as V8 does, Chez emits a separate interrupt check. I would like to be able to see Chez's disassembly but haven't gotten around to figuring out how yet.

Haskell's call performance is surprisingly bad here, beaten even by OCaml's bytecode compiler; is this the cost of laziness, or just a lacuna of the implementation? I do not know. I do know I have this mental image that Haskell is a good compiler but apparently if that's the standard, so is Guile :)

Finally, in this comparison section, I was not surprised by cpython's relatively poor performance; we know cpython is not fast. I think though that it just goes to show how little these microbenchmarks are worth when it comes to user experience; like many of you I use plenty of Python programs in my daily work and don't find them slow at all. Think of micro-benchmarks like x-ray diffraction; they can reveal the hidden substructure of DNA but they say nothing at all about the organism.

where to now?

Perhaps you noted that in the last graph, the Guile and Chez lines were labelled "(lexical)". That's because instead of running this program:

(define (fib n)
  (if (< n 2)
      n
      (+ (fib (- n 1)) (fib (- n 2)))))

They were running this, instead:

(define (fib n)
  (define (fib* n)
    (if (< n 2)
        n
        (+ (fib* (- n 1)) (fib* (- n 2)))))
  (fib* n))

The thing is, historically, Scheme programs have treated top-level definitions as being mutable. This is because you don't know the extent of the top-level scope -- there could always be someone else who comes and adds a new definition of fib, effectively mutating the existing definition in place.

This practice has its uses. It's useful to be able to go in to a long-running system and change a definition to fix a bug or add a feature. It's also a useful way of developing programs, to incrementally build the program bit by bit.


But, I would say that as someone who as written and maintained a lot of Scheme code, it's not a normal occurence to mutate a top-level binding on purpose, and it has a significant performance impact. If the compiler knows the target to a call, that unlocks a number of important optimizations: type check elision on the callee, more optimal closure representation, smaller stack frames, possible contification (turning calls into jumps), argument and return value count elision, representation specialization, and so on.

This overhead is especially egregious for calls inside modules. Scheme-the-language only gained modules relatively recently -- relative to the history of scheme -- and one of the aspects of modules is precisely to allow reasoning about top-level module-level bindings. This is why running Chez Scheme with the --program option is generally faster than --script (which I used for all of these tests): it opts in to the "newer" specification of what a top-level binding is.

In Guile we would probably like to move towards a more static way of treating top-level bindings, at least those within a single compilation unit. But we haven't done so yet. It's probably the most important single optimization we can make over the near term, though.

It's true though that even absent lexical optimizations, top-level calls can be made more efficient in Guile. I am not sure if we can reach Chez with the current setup of having a template JIT, because we need two return addresses: one virtual (for bytecode) and one "native" (for JIT code). Register allocation is also something to improve but it turns out to not be so important for fib, as there are few live values and they need to spill for the recursive call. But, we can avoid some of the indirection on the call, probably using an inline cache associated with the callee; Chez has had this optimization since 1984!

what guile learned from fib

This exercise has been useful to speed up Guile's procedure calls, as you can see for the difference between the latest Guile 2.9.2 release and what hasn't been released yet (2.9.3).

To decide what improvements to make, I extracted the assembly that Guile generated for fib to a standalone file, and tweaked it in a number of ways to determine what the potential impact of different scenarios was. Some of the detritus from this investigation is here.

There were three big performance improvements. One was to avoid eagerly initializing the slots in a function's stack frame; this took a surprising amount of run-time. Fortunately the rest of the toolchain like the local variable inspector was already ready for this change.

Another thing that became clear from this investigation was that our stack frames were too large; there was too much memory traffic. I was able to improve this in the lexical-call by adding an optimization to elide useless closure bindings. Usually in Guile when you call a procedure, you pass the callee as the 0th parameter, then the arguments. This is so the procedure has access to its closure. For some "well-known" procedures -- procedures whose callers can be enumerated -- we optimize to pass a specialized representation of the closure instead ("closure optimization"). But for well-known procedures with no free variables, there's no closure, so we were just passing a throwaway value (#f). An unhappy combination of Guile's current calling convention being stack-based and a strange outcome from the slot allocator meant that frames were a couple words too big. Changing to allow a custom calling convention in this case sped up fib considerably.

Finally, and also significantly, Guile's JIT code generation used to manually handle calls and returns via manual stack management and indirect jumps, instead of using the platform calling convention and the C stack. This is to allow unlimited stack growth. However, it turns out that the indirect jumps at return sites were stalling the pipeline. Instead we switched to use call/return but keep our manual stack management; this allows the CPU to use its return address stack to predict return targets, speeding up code.

et voilà

Well, long article! Thanks for reading. There's more to do but I need to hit the publish button and pop this off my stack. Until next time, happy hacking!

by Andy Wingo at June 26, 2019 10:34 AM

June 25, 2019

Programming Praxis

Dividing Without Divide

Today’s task comes from a student programmer:

Given a numerator and divisor of unsigned integers, compute the quotient and remainder. You cannot use divide, cannot use mod, and you want to optimize for speed.

After a while, the student programmer looked up the official solution:

def divide_problem(num, div):
    quotient = 1
    while num - (quotient * div) >= div:
        print(num - (quotient * div), "loop")
        quotient += 1
    remainder = num - (quotient * div)
    return quotient, remainder

Your task is to comment on the official solution and write a better one. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at June 25, 2019 09:00 AM

June 21, 2019

Programming Praxis

String Comparison

A string is a sequence of characters, perhaps including a backspace character that renders both itself and the previous character as if they were not part of the string. For instance, if we make the backspace character visible using the # symbol, the two strings abcx#de and abcde are identical. Multiple successive backspaces remove multiple successive characters.

Your task is to write a program that compares two strings that may contain backspaces and reports if they are equal. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at June 21, 2019 09:00 AM

June 18, 2019

Programming Praxis

Latin Squares

A latin square of order n is a square matrix with n rows and n columns, with each entry in the matrix containing an integer from 0 to n − 1, arranged so that no row or column contains duplicate integers. Here is a sample latin square of order 10:

8 3 7 1 5 6 4 2 0 9
4 5 6 2 0 9 3 7 8 1
9 2 3 8 7 5 1 4 6 0
2 6 0 3 9 8 7 5 1 4
0 4 2 9 3 7 8 1 5 6
6 1 4 0 2 3 9 8 7 5
1 7 5 4 6 0 2 3 9 8
3 0 9 7 8 1 5 6 4 2
5 8 1 6 4 2 0 9 3 7
7 9 8 5 1 4 6 0 2 3

Your task is to write a program that generates latin squares of order n. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at June 18, 2019 09:00 AM

June 17, 2019

GNU Guix

Substitutes are now available as lzip

For a long time, our build farm at ci.guix.gnu.org has been delivering substitutes (pre-built binaries) compressed with gzip. Gzip was never the best choice in terms of compression ratio, but it was a reasonable and convenient choice: it’s rock-solid, and zlib made it easy for us to have Guile bindings to perform in-process compression in our multi-threaded guix publish server.

With the exception of building software from source, downloads take the most time of Guix package upgrades. If users can download less, upgrades become faster, and happiness ensues. Time has come to improve on this, and starting from early June, Guix can publish and fetch lzip-compressed substitutes, in addition to gzip.

Lzip

Lzip is a relatively little-known compression format, initially developed by Antonio Diaz Diaz ca. 2013. It has several C and C++ implementations with surprisingly few lines of code, which is always reassuring. One of its distinguishing features is a very good compression ratio with reasonable CPU and memory requirements, according to benchmarks published by the authors.

Lzlib provides a well-documented C interface and Pierre Neidhardt set out to write bindings for that library, which eventually landed as the (guix lzlib) module.

With this in place we were ready to start migrating our tools, and then our build farm, to lzip compression, so we can all enjoy smaller downloads. Well, easier said than done!

Migrating

The compression format used for substitutes is not a core component like it can be in “traditional” binary package formats such as .deb since Guix is conceptually a “source-based” distro. However, deployed Guix installations did not support lzip, so we couldn’t just switch our build farm to lzip overnight; we needed to devise a transition strategy.

Guix asks for the availability of substitutes over HTTP. For example, a question such as:

“Dear server, do you happen to have a binary of /gnu/store/6yc4ngrsig781bpayax2cg6pncyhkjpq-emacs-26.2 that I could download?”

translates into prose to an HTTP GET of https://ci.guix.gnu.org/6yc4ngrsig781bpayax2cg6pncyhkjpq.narinfo, which returns something like:

StorePath: /gnu/store/6yc4ngrsig781bpayax2cg6pncyhkjpq-emacs-26.2
URL: nar/gzip/6yc4ngrsig781bpayax2cg6pncyhkjpq-emacs-26.2
Compression: gzip
NarHash: sha256:0h2ibqpqyi3z0h16pf7ii6l4v7i2wmvbrxj4ilig0v9m469f6pm9
NarSize: 134407424
References: 2dk55i5wdhcbh2z8hhn3r55x4873iyp1-libxext-1.3.3 …
FileSize: 48501141
System: x86_64-linux
Deriver: 6xqibvc4v8cfppa28pgxh0acw9j8xzhz-emacs-26.2.drv
Signature: 1;berlin.guixsd.org;KHNpZ25hdHV…

(This narinfo format is inherited from Nix and implemented here and here.) This tells us we can download the actual binary from /nar/gzip/…-emacs-26.2, and that it will be about 46 MiB (the FileSize field.) This is what guix publish serves.

The trick we came up with was to allow guix publish to advertise several URLs, one per compression format. Thus, for recently-built substitutes, we get something like this:

StorePath: /gnu/store/mvhaar2iflscidl0a66x5009r44fss15-gimp-2.10.12
URL: nar/gzip/mvhaar2iflscidl0a66x5009r44fss15-gimp-2.10.12
Compression: gzip
FileSize: 30872887
URL: nar/lzip/mvhaar2iflscidl0a66x5009r44fss15-gimp-2.10.12
Compression: lzip
FileSize: 18829088
NarHash: sha256:10n3nv3clxr00c9cnpv6x7y2c66034y45c788syjl8m6ga0hbkwy
NarSize: 94372664
References: 05zlxc7ckwflz56i6hmlngr86pmccam2-pcre-8.42 …
System: x86_64-linux
Deriver: vi2jkpm9fd043hm0839ibbb42qrv5xyr-gimp-2.10.12.drv
Signature: 1;berlin.guixsd.org;KHNpZ25hdHV…

Notice that there are two occurrences of the URL, Compression, and FileSize fields: one for gzip, and one for lzip. Old Guix instances will just pick the first one, gzip; newer Guix will pick whichever supported method provides the smallest FileSize, usually lzip. This will make migration trivial in the future, should we add support for other compression methods.

Users need to upgrade their Guix daemon to benefit from lzip. On a “foreign distro”, simply run guix pull as root. On standalone Guix systems, run guix pull && sudo guix system reconfigure /etc/config.scm. In both cases, the daemon has to be restarted, be it with systemctl restart guix-daemon.service or with herd restart guix-daemon.

First impressions

This new gzip+lzip scheme has been deployed on ci.guix.gnu.org for a week. Specifically, we run guix publish -C gzip:9 -C lzip:9, meaning that we use the highest compression ratio for both compression methods.

Currently, only a small subset of the package substitutes are available as both lzip and gzip; those that were already available as gzip have not been recompressed. The following Guile program that taps into the API of guix weather allows us to get some insight:

(use-modules (gnu) (guix)
             (guix monads)
             (guix scripts substitute)
             (srfi srfi-1)
             (ice-9 match))

(define all-packages
  (@@ (guix scripts weather) all-packages))

(define package-outputs
  (@@ (guix scripts weather) package-outputs))

(define (fetch-lzip-narinfos)
  (mlet %store-monad ((items (package-outputs (all-packages))))
    (return
     (filter (lambda (narinfo)
               (member "lzip" (narinfo-compressions narinfo)))
             (lookup-narinfos "https://ci.guix.gnu.org" items)))))

(define (lzip/gzip-ratio narinfo)
  (match (narinfo-file-sizes narinfo)
    ((gzip lzip)
     (/ lzip gzip))))

(define (average lst)
  (/ (reduce + 0 lst)
     (length lst) 1.))

Let’s explore this at the REPL:

scheme@(guile-user)> (define lst
                       (with-store s
                         (run-with-store s (fetch-lzip-narinfos))))
computing 9,897 package derivations for x86_64-linux...
updating substitutes from 'https://ci.guix.gnu.org'... 100.0%
scheme@(guile-user)> (length lst)
$4 = 2275
scheme@(guile-user)> (average (map lzip/gzip-ratio lst))
$5 = 0.7398994395478715

As of this writing, around 20% of the package substitutes are available as lzip, so take the following stats with a grain of salt. Among those, the lzip-compressed substitute is on average 26% smaller than the gzip-compressed one. What if we consider only packages bigger than 5 MiB uncompressed?

scheme@(guile-user)> (define biggest
                       (filter (lambda (narinfo)
                                 (> (narinfo-size narinfo)
                                    (* 5 (expt 2 20))))
                               lst))
scheme@(guile-user)> (average (map lzip/gzip-ratio biggest))
$6 = 0.5974238562384483
scheme@(guile-user)> (length biggest)
$7 = 440

For those packages, lzip yields substitutes that are 40% smaller on average. Pretty nice! Lzip decompression is slightly more CPU-intensive than gzip decompression, but downloads are bandwidth-bound, so the benefits clearly outweigh the costs.

Going forward

The switch from gzip to lzip has the potential to make upgrades “feel” faster, and that is great in itself.

Fundamentally though, we’ve always been looking in this project at peer-to-peer solutions with envy. Of course, the main motivation is to have a community-supported and resilient infrastructure, rather than a centralized one, and that vision goes hand-in-hand with reproducible builds.

We started working on an extension to publish and fetch substitutes over IPFS. Thanks to its content-addressed nature, IPFS has the potential to further reduce the amount of data that needs to be downloaded on an upgrade.

The good news is that IPFS developers are also interested in working with package manager developers, and I bet there’ll be interesting discussions at IPFS Camp in just a few days. We’re eager to pursue our IPFS integration work, and if you’d like to join us and hack the good hack, let’s get in touch!

About GNU Guix

GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the kernel Linux, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, and AArch64 machines.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.

by Ludovic Courtès at June 17, 2019 12:30 PM

June 14, 2019

Programming Praxis

Van Eck Sequence

Neil Sloane is on Numberphile again, discussing the Van Eck sequence (A181391):

The first item in the sequence is 0. Compute the next item as follows: If the previous item has not previously appeared in the sequence, add 0 to the sequence, otherwise add to the sequence the number of steps back in the sequence the previous item previously appeared. For instance, the first item is 0. Since 0 has not previously appeared in the sequence, the next item is 0. Now 0 has previously appeared, and the previous 0 was one back in the sequence, so add 1 to the sequence. Since 1 has not previously appeared, add 0. But 0 appeared previously, two back in the sequence, so add 2. Since 2 has not previously appeared, add 0. But 0 appeared previously, two items back, so add 2 to the sequence. Since 2 previously appeared in the sequence, two terms back, add another 2 to the sequence. The next item in the sequence is 1, because 2 also appeared as the previous number. Since 1 appeared in the sequence, count back to the previous 1, and add 6 to the sequence. And so on. The sequence begins 0, 0, 1, 0, 2, 0, 2, 2, 1, 6, 0, 5, 0, 2, 6, 5, 4, 0, ….

Your task is to write a program that generates the Van Eck sequence and investigate the properties of the sequence. When you are finished, you are welcome to ,a href=”https://programmingpraxis.com/2019/06/14/van-eck-sequence/2/”>read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at June 14, 2019 09:00 AM

June 11, 2019

Programming Praxis

Maximum Product

I think today’s exercise comes from one of those hacker testing sites, but I’m not sure:

Given three arrays of integers, both positive and negative, find the maximum product that can be formed by taking one element from each array. For instance, if A = [10,-10,15,-2], B = [10,-12,13,-2], and C = [-11,-10,9,-12], the maximum product is 2160 using 15, -12 and -12.

Your task is to write a program that finds the maximum product of three integers, taking one each from three arrays. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at June 11, 2019 09:00 AM

June 07, 2019

Programming Praxis

Primitive Roots

In modular arithmetic, a number g is a primitive root of n if, for every a coprime to n, there exists some k such that gka (mod n). The concept of a primitive root was developed by Gauss and pops up from time to time in work on cryptography and number theory.

There is no formula for computing a primitive root, but they are common enough that it is easy to just randomly search for them; more commonly, people just start at 2 and work their way up. Once a primitive root has been found, the remaining primitive roots of n can be found by exponentiating mod n.

Your task is to write a program that computes the primitive roots of prime n. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at June 07, 2019 09:00 AM

June 04, 2019

Programming Praxis

Reverse Vowels

Now that school is finished for the year, I’m catching up on the homework questions that people have sent me over the last several months. This one is fun:

Given a string, write it with the vowels reversed, maintaining the original capitalization of the vowels. For instance, “HELLO world” becomes “HOLLO werld” and “Programming PRAXIS” becomes “Prigramming PRAXOS” (I kinda like that one).

Your task is to write a program that returns an input string with vowels in reverse order, respecting the original capitalization. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at June 04, 2019 09:00 AM

May 31, 2019

Programming Praxis

Brush Strokes

Today’s problem must be someone’s homework from an algorithms course; it makes a fine opportunity for some code golfing:

An array of positive integers represents the heights of a series of buildings; for instance, the array [1, 4, 3, 2, 3, 1] represents the six buildings that, placed in a row, look like this:

     XXX
     XXX  XXX       XXX
     XXX  XXX  XXX  XXX
XXX  XXX  XXX  XXX  XXX  XXX

A painter drawing a picture of the buildings paints the buildings by making a single horizontal brush stroke from left to right as far as possible, working from bottom to top. The first stroke covers the first floor of all six buildings. The second stroke covers the second floor of four buildings. The third stroke covers the third floor of two buildings, and the fourth stroke covers the third floor of one building. Finally, the fifth stroke covers the fourth floor of one building. The painter requires five brush strokes to cover all six buildings.

Your task is to write a program that takes an input array representing building heights and calculates the number of horizontal brush strokes required to cover all floors on all buildings. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at May 31, 2019 09:00 AM

May 29, 2019

Programming Praxis

Logistic Map

[ I apologize for posting this exercise on Wednesday morning instead of Tuesday morning. I got mixed up because of the three-day holiday weekend. ]

I recently came across the logistic map which is an example of how complex, chaotic behavior can arise from simple non-linear equations. The logistic map is written

xn+1 = rxn (1 − xn

where 0 < xn < 1 represents the ratio of the existing population to the maximum possible and 0 < r < 4 represents the stability in the model of population density. I can't do the logistic map justice here; look at the Wikipedia page and follow the references it gives for a fascinating introduction to chaos theory.

Your task is to write a program that computes sequences of the logistic map, and experiment with it. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at May 29, 2019 09:00 AM

May 24, 2019

Programming Praxis

Sum Of Squares Of Divisors Is Square

A few days ago I answered this question on Stack Overflow:

Divisors of 42 are : 1, 2, 3, 6, 7, 14, 21, 42. These divisors squared are: 1, 4, 9, 36, 49, 196, 441, 1764. The sum of the squared divisors is 2500 which is 50 * 50, a square!

Given two integers m, n (1 ≤ mn) we want to find all integers between m and n whose sum of squared divisors is itself a square. 42 is such a number.

The result will be an array of arrays, each subarray having two elements, first the number whose squared divisors is a square and then the sum of the squared divisors.

Your task is to solve MrOldSir’s problem; he asked for a solution in Python, but you are free to use any language. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at May 24, 2019 09:00 AM

Andy Wingo

lightening run-time code generation

The upcoming Guile 3 release will have just-in-time native code generation. Finally, amirite? There's lots that I'd like to share about that and I need to start somewhere, so this article is about one piece of it: Lightening, a library to generate machine code.

on lightning

Lightening is a fork of GNU Lightning, adapted to suit the needs of Guile. In fact at first we chose to use GNU Lightning directly, "vendored" into the Guile source respository via the git subtree mechanism. (I see that in the meantime, git gained a kind of a subtree command; one day I will have to figure out what it's for.)

GNU Lightning has lots of things going for it. It has support for many architectures, even things like Itanium that I don't really care about but which a couple Guile users use. It abstracts the differences between e.g. x86 and ARMv7 behind a common API, so that in Guile I don't need to duplicate the JIT for each back-end. Such an abstraction can have a slight performance penalty, because maybe it missed the opportunity to generate optimal code, but this is acceptable to me: I was more concerned about the maintenance burden, and GNU Lightning seemed to solve that nicely.

GNU Lightning also has fantastic documentation. It's written in C and not C++, which is the right thing for Guile at this time, and it's also released under the LGPL, which is Guile's license. As it's a GNU project there's a good chance that GNU Guile's needs might be taken into account if any changes need be made.

I mentally associated Paolo Bonzini with the project, who I knew was a good no-nonsense hacker, as he used Lightning for a smalltalk implementation; and I knew also that Matthew Flatt used Lightning in Racket. Then I looked in the source code to see architecture support and was pleasantly surprised to see MIPS, POWER, and so on, so I went with GNU Lightning for Guile in our 2.9.1 release last October.

on lightening the lightning

When I chose GNU Lightning, I had in mind that it was a very simple library to cheaply write machine code into buffers. (Incidentally, if you have never worked with this stuff, I remember a time when I was pleasantly surprised to realize that an assembler could be a library and not just a program that processes text. A CPU interprets machine code. Machine code is just bytes, and you can just write C (or Scheme, or whatever) functions that write bytes into buffers, and pass those buffers off to the CPU. Now you know!)

Anyway indeed GNU Lightning 1.4 or so was that very simple library that I had in my head. I needed simple because I would need to debug any problems that came up, and I didn't want to add more complexity to the C side of Guile -- eventually I should be migrating this code over to Scheme anyway. And, of course, simple can mean fast, and I needed fast code generation.

However, GNU Lightning has a new release series, the 2.x series. This series is a rewrite in a way of the old version. On the plus side, this new series adds all of the weird architectures that I was pleasantly surprised to see. The old 1.4 didn't even have much x86-64 support, much less AArch64.

This new GNU Lightning 2.x series fundamentally changes the way the library works: instead of having a jit_ldr_f function that directly emits code to load a float from memory into a floating-point register, the jit_ldr_f function now creates a node in a graph. Before code is emitted, that graph is optimized, some register allocation happens around call sites and for temporary values, dead code is elided, and so on, then the graph is traversed and code emitted.

Unfortunately this wasn't really what I was looking for. The optimizations were a bit opaque to me and I just wanted something simple. Building the graph took more time than just emitting bytes into a buffer, and it takes more memory as well. When I found bugs, I couldn't tell whether they were related to my usage or in the library itself.

In the end, the node structure wasn't paying its way for me. But I couldn't just go back to the 1.4 series that I remembered -- it didn't have the architecture support that I needed. Faced with the choice between changing GNU Lightning 2.x in ways that went counter to its upstream direction, switching libraries, or refactoring GNU Lightning to be something that I needed, I chose the latter.

in which our protagonist cannot help himself

Friends, I regret to admit: I named the new thing "Lightening". True, it is a lightened Lightning, yes, but I am aware that it's horribly confusing. Pronounced like almost the same, visually almost identical -- I am a bad person. Oh well!!

I ported some of the existing GNU Lightning backends over to Lightening: ia32, x86-64, ARMv7, and AArch64. I deleted the backends for Itanium, HPPA, Alpha, and SPARC; they have no Debian ports and there is no situation in which I can afford to do QA on them. I would gladly accept contributions for PPC64, MIPS, RISC-V, and maybe S/390. At this point I reckon it takes around 20 hours to port an additional backend from GNU Lightning to Lightening.

Incidentally, if you need a code generation library, consider your choices wisely. It is likely that Lightening is not right for you. If you can afford platform-specific code and you need C, Lua's DynASM is probably the right thing for you. If you are in C++, copy the assemblers from a JavaScript engine -- C++ offers much more type safety, capabilities for optimization, and ergonomics.

But if you can only afford one emitter of JIT code for all architectures, you need simple C, you don't need register allocation, you want a simple library to just include in your source code, and you are good with the LGPL, then Lightening could be a thing for you. Check the gitlab page for info on how to test Lightening and how to include it into your project.

giving it a spin

Yesterday's Guile 2.9.2 release includes Lightening, so you can give it a spin. The switch to Lightening allowed us to lower our JIT optimization threshold by a factor of 50, letting us generate fast code sooner. If you try it out, let #guile on freenode know how it went. In any case, happy hacking!

by Andy Wingo at May 24, 2019 08:44 AM

May 21, 2019

GNU Guix

Creating and using a custom Linux kernel on Guix System

Guix is, at its core, a source based distribution with substitutes, and as such building packages from their source code is an expected part of regular package installations and upgrades. Given this starting point, it makes sense that efforts are made to reduce the amount of time spent compiling packages, and recent changes and upgrades to the building and distribution of substitutes continues to be a topic of discussion within Guix.

One of the packages which I prefer to not build myself is the Linux-Libre kernel. The kernel, while not requiring an overabundance of RAM to build, does take a very long time on my build machine (which my children argue is actually their Kodi computer), and I will often delay reconfiguring my laptop while I want for a substitute to be prepared by the official build farm. The official kernel configuration, as is the case with many GNU/Linux distributions, errs on the side of inclusiveness, and this is really what causes the build to take such a long time when I build the package for myself.

The Linux kernel, however, can also just be described as a package installed on my machine, and as such can be customized just like any other package. The procedure is a little bit different, although this is primarily due to the nature of how the package definition is written.

The linux-libre kernel package definition is actually a procedure which creates a package.

(define* (make-linux-libre version hash supported-systems
                           #:key
                           ;; A function that takes an arch and a variant.
                           ;; See kernel-config for an example.
                           (extra-version #f)
                           (configuration-file #f)
                           (defconfig "defconfig")
                           (extra-options %default-extra-linux-options)
                           (patches (list %boot-logo-patch)))
  ...)

The current linux-libre package is for the 5.1.x series, and is declared like this:

(define-public linux-libre
  (make-linux-libre %linux-libre-version
                    %linux-libre-hash
                    '("x86_64-linux" "i686-linux" "armhf-linux" "aarch64-linux")
                    #:patches %linux-libre-5.1-patches
                    #:configuration-file kernel-config))

Any keys which are not assigned values inherit their default value from the make-linux-libre definition. When comparing the two snippets above, you may notice that the code comment in the first doesn't actually refer to the #:extra-version keyword; it is actually for #:configuration-file. Because of this, it is not actually easy to include a custom kernel configuration from the definition, but don't worry, there are other ways to work with what we do have.

There are two ways to create a kernel with a custom kernel configuration. The first is to provide a standard .config file during the build process by including an actual .config file as a native input to our custom kernel. The following is a snippet from the custom 'configure phase of the make-linux-libre package definition:

(let ((build  (assoc-ref %standard-phases 'build))
      (config (assoc-ref (or native-inputs inputs) "kconfig")))

  ;; Use a custom kernel configuration file or a default
  ;; configuration file.
  (if config
      (begin
        (copy-file config ".config")
        (chmod ".config" #o666))
      (invoke "make" ,defconfig))

Below is a sample kernel package for one of my computers. Linux-Libre is just like other regular packages and can be inherited and overridden like any other:

(define-public linux-libre/E2140
  (package
    (inherit linux-libre)
    (native-inputs
     `(("kconfig" ,(local-file "E2140.config"))
      ,@(alist-delete "kconfig"
                      (package-native-inputs linux-libre))))))

In the same directory as the file defining linux-libre-E2140 is a file named E2140.config, which is an actual kernel configuration file. I left the defconfig keyword of make-linux-libre blank, so the only kernel configuration in the package is the one which I included as a native-input.

The second way to create a custom kernel is to pass a new value to the extra-options keyword of the make-linux-libre procedure. The extra-options keyword works with another function defined right below it:

(define %default-extra-linux-options
  `(;; https://lists.gnu.org/archive/html/guix-devel/2014-04/msg00039.html
   ("CONFIG_DEVPTS_MULTIPLE_INSTANCES" . #t)
   ;; Modules required for initrd:
   ("CONFIG_NET_9P" . m)
   ("CONFIG_NET_9P_VIRTIO" . m)
   ("CONFIG_VIRTIO_BLK" . m)
   ("CONFIG_VIRTIO_NET" . m)
   ("CONFIG_VIRTIO_PCI" . m)
   ("CONFIG_VIRTIO_BALLOON" . m)
   ("CONFIG_VIRTIO_MMIO" . m)
   ("CONFIG_FUSE_FS" . m)
   ("CONFIG_CIFS" . m)
   ("CONFIG_9P_FS" . m)))

(define (config->string options)
  (string-join (map (match-lambda
                      ((option . 'm)
                       (string-append option "=m"))
                      ((option . #t)
                       (string-append option "=y"))
                      ((option . #f)
                       (string-append option "=n")))
                    options)
               "\n"))

And in the custom configure script from the make-linux-libre package:

;; Appending works even when the option wasn't in the
;; file.  The last one prevails if duplicated.
(let ((port (open-file ".config" "a"))
      (extra-configuration ,(config->string extra-options)))
  (display extra-configuration port)
  (close-port port))

(invoke "make" "oldconfig"))))

So by not providing a configuration-file the .config starts blank, and then we write into it the collection of flags that we want. Here's another custom kernel which I have:

(define %macbook41-full-config
  (append %macbook41-config-options
          %filesystems
          %efi-support
          %emulation
          (@@ (gnu packages linux) %default-extra-linux-options)))

(define-public linux-libre-macbook41
  ;; XXX: Access the internal 'make-linux-libre' procedure, which is
  ;; private and unexported, and is liable to change in the future.
  ((@@ (gnu packages linux) make-linux-libre) (@@ (gnu packages linux) %linux-libre-version)
                      (@@ (gnu packages linux) %linux-libre-hash)
                      '("x86_64-linux")
                      #:extra-version "macbook41"
                      #:patches (@@ (gnu packages linux) %linux-libre-5.1-patches)
                      #:extra-options %macbook41-config-options))

From the above example %filesystems is a collection of flags I compiled enabling different filesystem support, %efi-support enables EFI support and %emulation enables my x86_64-linux machine to act in 32-bit mode also. %default-extra-linux-options are the ones quoted above, which had to be added in since I replaced them in the extra-options keyword.

This all sounds like it should be doable, but how does one even know which modules are required for their system? The two places I found most helpful to try to answer this question were the Gentoo Handbook, and the documentation from the kernel itself. From the kernel documentation, it seems that make localmodconfig is the command we want.

In order to actually run make localmodconfig we first need to get and unpack the kernel source code:

tar xf $(guix build linux-libre --source)

Once inside the directory containing the source code run touch .config to create an initial, empty .config to start with. make localmodconfig works by seeing what you already have in .config and letting you know what you're missing. If the file is blank then you're missing everything. The next step is to run:

guix environment linux-libre -- make localmodconfig

and note the output. Do note that the .config file is still empty. The output generally contains two types of warnings. The first start with "WARNING" and can actually be ignored in our case. The second read:

module pcspkr did not have configs CONFIG_INPUT_PCSPKR

For each of these lines, copy the CONFIG_XXXX_XXXX portion into the .config in the directory, and append =m, so in the end it looks like this:

CONFIG_INPUT_PCSPKR=m
CONFIG_VIRTIO=m

After copying all the configuration options, run make localmodconfig again to make sure that you don't have any output starting with "module". After all of these machine specific modules there are a couple more left that are also needed. CONFIG_MODULES is necessary so that you can build and load modules separately and not have everything built into the kernel. CONFIG_BLK_DEV_SD is required for reading from hard drives. It is possible that there are other modules which you will need.

This post does not aim to be a guide to configuring your own kernel however, so if you do decide to build a custom kernel you'll have to seek out other guides to create a kernel which is just right for your needs.

The second way to setup the kernel configuration makes more use of Guix's features and allows you to share configuration segments between different kernels. For example, all machines using EFI to boot have a number of EFI configuration flags that they need. It is likely that all the kernels will share a list of filesystems to support. By using variables it is easier to see at a glance what features are enabled and to make sure you don't have features in one kernel but missing in another.

Left undiscussed however, is Guix's initrd and its customization. It is likely that you'll need to modify the initrd on a machine using a custom kernel, since certain modules which are expected to be built may not be available for inclusion into the initrd.

Suggestions and contributions toward working toward a satisfactory custom initrd and kernel are welcome!

About GNU Guix

GNU Guix is a transactional package manager and an advanced distribution of the GNU system that respects user freedom. Guix can be used on top of any system running the kernel Linux, or it can be used as a standalone operating system distribution for i686, x86_64, ARMv7, and AArch64 machines.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. When used as a standalone GNU/Linux distribution, Guix offers a declarative, stateless approach to operating system configuration management. Guix is highly customizable and hackable through Guile programming interfaces and extensions to the Scheme language.

by Efraim Flashner at May 21, 2019 10:00 AM

Programming Praxis

Fetch OEIS Sequences

I needed to fetch several sequences from the OEIS the other day. We’ve done that in a previous exercise, in Scheme, but I wanted something I could run from the command line. So I wrote a program.

Your task is to write a program, called from a normal shell command line, that fetches a sequence from OEIS. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at May 21, 2019 09:00 AM