Planet Scheme

February 08, 2016

The Racket Blog

Racket Web Server Security Vulnerability

We recently discovered a serious security vulnerability in the Racket web server, which can lead to unintended disclosure of files on the machine running the web server. This vulnerability is fixed in Racket version 6.4, just released, and we encourage people to upgrade to that version.

The vulnerability affects web servers that serve static files using the #:extra-files-paths option, including the default value of this option. If you do not use the Racket web serve to serve static files, or you do so via a mechanism that does not use the make-url->path function, then you are likely not vulnerable. Affected web serves will allow specially-crafted URLs to access files outside of the specified paths, potentially exposing any file that the web server process is able to read.

If you cannot immediately upgrade to version 6.4, we have provided a package catalog with updated versions of the "web-server-lib" package for versions of Racket back to 6.0. That catalog is located at

 http://download.racket-lang.org/patches/web-server-1/

To use it to upgrade your Racket installation, add it as a catalog using raco pkg config. To make this process easier, you can download the Racket script available here. Then run:

  $ racket add-catalog.rkt
  $ raco pkg update -i web-server-lib

If you need advice on updating Racket installations older than version 6.0, please let us know and we will provide them.

To test that your Racket installation is fixed, you can run the program here. It will print whether your installation is out of date.

by Sam Tobin-Hochstadt (noreply@blogger.com) at February 08, 2016 07:05 PM

Racket v6.4

Racket version 6.4 is now available from http://racket-lang.org/
  • We fixed a security vulnerability in the web server. The existing web server is vulnerable to a navigation attack if it is also enabled to serve files statically; that is, any file readable by the web server is accessible remotely. For more information, see this post.
  • DrRacket's scrolling is faster.
  • Incremental garbage-collection mode can eliminate long pauses in a program. For example, incremental mode is useful for avoiding pauses in games and animations.
    Programs must specifically request incremental mode with (collect-garbage 'incremental), but libraries such as 2htdp/universe include the request as part of the library's implementation.
  • The default package catalog is an HTTPS address instead of HTTP, and package operations properly validate server certificates when using HTTPS.
  • Documentation may define their own categories for the manual top-level page by using strings, rather than only symbols that name pre-defined categories.
  • The Racket cheat sheet is included in the main distribution.
  • DrRacket is available in Bulgarian, thanks to Alexander Shopov.
  • The contract Typed Racket generates for the Any type is more permissive, allowing more typed/untyped programs to work without contract errors.
  • Redex supports binding specifications; describe which variables bind in which expressions and your metafunctions and reduction relations automatically become scope-sensitive. Thanks to Paul Stansifer for this improvement.
  • All pict functions accept pict-convertibles. This allows for transparent interoperability between pict and libraries like 2htdp/image.
  • The raco profile and raco contract-profile commands provide easy access to profiling tools, without requiring program modifications.
Feedback Welcome

by Ryan Culpepper (noreply@blogger.com) at February 08, 2016 04:04 PM

Andy Wingo

a lambda is not (necessarily) a closure

preface

Greets, folks! Check it out: Guile had a whole track devoted to it at FOSDEM this year. OK, so it was only half a day, but there were like a dozen talks! And the room was full all the morning! And -- get this -- I had nothing to do with its organization! I think we can credit the Guix project with the recent surge of interest in Guile; fully half the talks were from people excited about using Guix to solve their problems. Thanks very, very much to Pjotr Prins for organizing the lovely event.

I gave a talk on how the Guile 2.2 compiler and virtual machine could change the way people program. Happily, the video recording came out OK! Video below (or here if that doesn't work), and slides here.

The time was super-limited though and I wasn't able to go into the detail that I'd like. So, dear readers, here we are, with a deeper look on lambda representation in Guile.

a lambda is not (necessarily) a closure

What is this?

(lambda (a b) (+ a b))

If you answer, "it's a lambda expression", you're right! You're also right if you say it's a function -- I mean, lambda makes a function, right? There are lots of things that you could say that would be right, including silly things like "twenty-two characters set in an awkward typeface".

But if you said "it's a closure" -- well you're right in general I guess, like on a semantic what-does-it-mean level, but as far as how Guile represents this thing at run-time, hoo boy are there a number of possibilities, and a closure is just one of them. This article dives into the possibilities, with the goal being to help you update your mental model of "how much do things cost".

In Guile, a lambda expression can be one of the following things at run-time:

  1. Gone

  2. Inlined

  3. Contified

  4. Code pointer

  5. Closure

Let's look into these one-by-one.

lambda: gone

If Guile can prove that a lambda expression is never reached, it won't be present at run-time. The main way this happens is via partial evaluation, but later passes can do this too. In the most basic example, consider the lambda bound to f by this let expression.

(let ((f (lambda ()
           (launch-the-missiles!))))
  42)

Guile has an ,optimize command that can be run at the REPL to show the effect of partial evaluation on your code. These days it's a bit out of date in a way -- it can't show what CPS-based optimization will do to your code -- but for our purposes here it will transform the expression to the following code:

(let ((f (lambda ()
           (launch-the-missiles!))))
  42)
=> 42

So the lambda is gone, big whoop. The interesting thing though is that this happens concurrently with other things that partial evaluation does, so the lambda goes away in this expression too:

(let ((launch? #f)
      (f (lambda ()
           (launch-the-missiles!))))
  (if launch? (f) 'just-kidding))
=> 'just-kidding

lambda: inlined

The other trick that partial evaluation can do with lambda expressions is inlining. Re-taking the example above, if we change launch? to #t, the branch folds the other way and the application (f) inlines:

(let ((launch? #t)
      (f (lambda ()
           (launch-the-missiles!))))
  (if launch? (f) 'just-kidding))
=> (let ((launch? #t)
         (f (lambda ()
              (launch-the-missiles!))))
     (if #t (f) 'just-kidding))
=> (let ((launch? #t)
         (f (lambda ()
              (launch-the-missiles!))))
     (f))
=> (let ((launch? #t)
         (f (lambda ()
              (launch-the-missiles!))))
     ((lambda () (launch-the-missiles!))))
=> (let ((launch? #t)
         (f (lambda ()
              (launch-the-missiles!))))
     (launch-the-missiles!))
=> (launch-the-missiles!)

Here again the lambda is gone, but not because it was unreachable, but because it was inlined into its use. I showed some intermediate steps as well, just so you get a feel about how partial evaluation works. The inlining step is illustrated by the fourth transformation, where the lambda application went away, replaced by its body.

Partial evaluation can also unroll many kinds of recursion:

(letrec ((lp (lambda (n)
               (if (zero? n)
                   n
                   (+ n (lp (1- n)))))))
  (lp 5))
=> 15

The partial evaluator in Guile 2.2 is more or less unchanged from the one in Guile 2.0, so you get these benefits on old Guile as well. Building a good intuition as to what the partial evaluator will do is important if you want to get the best performance out of Guile. Use the ,optimize command at the REPL to see the effects of partial evaluation on any given expression.

lambda: contified

So, here we step into the unknown, in the sense that from here on out, these optimizations are new in Guile 2.2. Unfortunately, they can be hard to see as they aren't really representable in terms of source-to-source transformations over Scheme programs. Consider this program:

(define (count-down n)
  (define loop
    (lambda (n out)
      (let ((out (cons n out)))
        (if (zero? n)
            out
            (loop (1- n) out)))))
  (loop n '()))

It's a little loop that builds a list of integers. The lambda in this loop, bound to loop, will be contified into the body of count-down.

To see that this is the case, we have to use a new tool, ,disassemble (abbreviated ,x). This takes a procedure and prints its bytecode. It can be hard to understand, so I'm going to just point out some "shapes" of disassembly that you can recognize.

> ,x count-down
Disassembly of #<procedure count-down (n)> at #x9775a8:

[...]
L1:
  10    (cons 2 1 2)
  11    (br-if-u64-=-scm 0 1 #f 5) ;; -> L2
  14    (sub/immediate 1 1 1)
  15    (br -5)                    ;; -> L1
L2:
[...]

I've snipped the disassembly to the interesting part. The first thing to notice is that there's just one procedure here: only one time that ,x prints "Disassembly of ...". That means that the lambda was eliminated somehow, either because it was dead or inlined, as described above, or because it was contified. It wasn't dead; we can see that from looking at the ,optimize output, which doesn't significantly change the term. It wasn't inlined either; again, ,optimize can show you this, but consider that because partial evaluation can't determine when the loop would terminate, it won't find a point at which it can stop unrolling the loop. (In practice what happens though is that it tries, hits an effort or code growth limit, then aborts the inlining attempt.)

However, what we see in the disassembly is the body of the loop: we cons something onto a list (the cons), check if a two numbers are equal (br-if-u64-=-scm), and if they are we jump out of the loop (L2). Otherwise we subtract 1 from a number (sub/immediate) and loop (br to L1). That is the loop. So what happened?

Well, if inlining is copying, then contification is rewiring. Guile's compiler was able to see that although it couldn't inline the loop function, it could see all of loop's callers, and that loop always returned to the same "place". (Another way to say this is that loop is always called with the same continuation.) The compiler was then able to incorporate the body of loop into count-down, rewiring calls to loop to continue to loop's beginning, and rewriting returns from loop to proceed to the continuation of the loop call.

a digression on language

These words like "contification" and "continuation" might be unfamiliar to you, and I sympathize. If you know of a better explanation of contification, I welcome any links you might have. The name itself comes from a particular formulation of the intermediate language used in Guile, the so-called "CPS" language. In this language, you convert a program to make it so it never returns: instead, each sub-expression passes its values to its continuation via a tail call. Each continuation is expressed as a lambda expression. See this article for an intro to CPS and how it relates to things you might know like SSA.

Transforming a program into CPS explodes it into a bunch of little lambdas: every subexpression gets its own. You would think this would be a step backwards, if your goal is to eliminate closures in some way. However it's possible to syntactically distinguish between lambda expressions which are only ever used as continuations and those that are used as values. Let's call the former kind of lambda a cont and the latter a function. A cont-lambda can be represented at run-time as a label -- indeed, the disassembly above shows this. It turns out that all lambda expressions introduced by the CPS transformation are conts. Conts form a first-order flow graph, and are basically the same as SSA basic blocks. If you're interested in this kind of thing, see Andrew Kennedy's great paper, Compiling with Continuations, Continued, and see also CPS soup for more on how this evolved in Guile 2.2.

I say all this to give you a vocabulary. Functions that are present in the source program start life as being treated as function-lambdas. Contification takes function-lambda values and turns then into cont-lambda labels, if it can. That's where the name "contification" comes from. For more on contification, see MLton's page on its contification pass, linking to the original paper that introduces the concept.

and we're back

Contification incorporates the body of a function into the flow graph of its caller. Unlike inlining, contification is always an optimization: it never causes code growth, and it enables other optimizations by exposing first-order control flow. (It's easier for the compiler to reason about first-order loops than it is to reason about control flow between higher-order functions.)

Contification is a reliable optimization. If a function's callers are always visible to the compiler, and the function is always called with the same continuation, it will be contified. These are two fairly simple conditions that you can cultivate your instincts to detect and construct.

Contification can also apply to mutually recursive functions, if as a group they are all always called with the same continuation. It's also an iterative process, in the sense that contifying one set of functions can expose enough first-order control flow that more contification opportunities become apparent.

It can take a while to get a feel for when this optimization applies. You have to have a feel for what a continuation is, and what it means for a function's callers to all be visible to the compiler. However, once you do internalize these conditions, contification is something you can expect Guile's compiler to do to your code.

lambda: code pointer

The next representation a lambda might have at run-time is as a code pointer. In this case, the function fails the conditions for contification, but we still avoid allocating a closure.

Here's a little example to illustrate the case.

(define (thing)
  (define (log what)
    (format #t "Very important log message: ~a\n" what))
  (log "ohai")
  (log "kittens")
  (log "donkeys"))

In this example, log is called with three different continuations, so it's not eligible for contification. Unfortunately, this example won't illustrate anything for us because the log function is so small that partial evaluation will succeed in inlining it. (You could determine this for yourself by using ,optimize.) So let's make it bigger, to fool the inliner:

(define (thing)
  (define (log what)
    (format #t "Very important log message: ~a\n" what)
    ;; If `log' is too short, it will be inlined.  Make it bigger.
    (format #t "Did I ever tell you about my chickens\n")
    (format #t "I was going to name one Donkey\n")
    (format #t "I always wanted a donkey\n")
    (format #t "In the end we called her Raveonette\n")
    (format #t "Donkey is not a great name for a chicken\n")
    (newline) (newline) (newline) (newline) (newline))
  (log "ohai")
  (log "kittens")
  (log "donkeys"))

Now if we disassembly it, we do get disassembly for two different functions:

,x thing
Disassembly of #<procedure thing ()> at #x97d704:
[...]

Disassembly of log at #x97d754:
[...]

So, good. We defeated the inliner. Let's look closer at the disassembly of the outer function.

,x thing
Disassembly of #<procedure thing ()> at #x97d704:
[...]
  12    (call-label 3 2 8)              ;; log at #x97d754

Here we see that instead of the generic call instruction, we have the specific call-label instruction which calls a procedure whose code is at a known offset from the calling function.

call-label is indeed a cheaper call than the full call instruction that has to check that the callee is actually a function and so on. But that's not the real optimization here. If all callers of a function are known -- and by this time, you're starting to catch the pattern, I think -- if all callers are known, then the procedure does not need to exist as a value at run-time.

This affords a number of optimization opportunities. Theoretically there are many -- all call sites can be specialized to the specific callee. The callee can have an optimized calling convention that doesn't have anything to do with the generic convention. Effect analysis can understand the side effects and dependencies of the callee in a more precise way. The compiler can consider unboxing some arguments and return values, if it finds that useful.

In Guile though, there's only one real optimization that we do, and that is related to free variables. Currently in Guile, all procedures have a uniform calling convention, in which the procedure being called (the callee) is itself passed as the zeroeth argument, and then the arguments follow on the stack. The function being called accesses its free variables through that zeroeth argument. If however there is no need for the procedure to be represented as a value, we are free to specialize that zeroeth argument.

So, consider a well-known procedure like log above. (By "well-known", we mean that all of log's callers are known.) Since log doesn't actually have any lexically bound free variables, we can just pass in anything as argument zero when invoking it. In practice we pass #f, because it happens to be an easy value to make.

(Why isn't format treated as a free variable in log? Because there is special support from the linker for lazily initializing the locations of variables imported from other modules or defined at the top level instead of within a lexical contour. In short: only variables that are (a) used within the lambda and (b) defined within a let or similar count towards a lambda's free variables.)

For a well-known procedure with only one free variable, we can pass in that free variable as the zeroeth argument. Internally to the function, we rewrite references to that free variable to reference argument 0 instead. This is a neat hack because we can have a lambda with a free variable but which results in no allocation at run-time.

Likewise if there are two free variables -- and this is starting to sound like Alice's restaurant, isn't it -- well we do have to pass in their values to the procedure, but we don't have to build an actual closure object with a tag and a code pointer and all. Pairs happen to be small and have some fast paths in Guile, so we use that. References to the free variables get internally rewritten to be car or cdr of argument 0.

For three or more free variables, we do the same, but with a vector.

For a final trick, a set of mutually recursive procedures whose callers are all known can share the object that collects their free variables. We collect the union of the free variables of all of the procedures, and pack them into a specialized representation as above.

Note that for well-known procedures, all variables that are free in the lambda are also free in the caller; that's why the 1-free-variable substitution works. The lambda is bound in a scope that dominates its callers, but its free variables dominate the lambda so they dominate the callers too. For that reason in this case we could choose to do lambda lifting instead, with no penalty: instead of bundling up the free variables in a heap object, we could pass them as arguments. Dybvig claims this is not a great idea because it increases register pressure. That could be true, but I haven't seen the numbers. Anyway, we do the flat closure thing, so we pack the free vars into data.

All these ideas came pretty much straight from the great Optimizing Closures in O(0) Time by Andrew Keep et al.

lambda: closure

OK! So you have a lambda whose callees are not all visible to the compiler. You need to reify the procedure as a value. That reified procedure-as-value is a closure: an object with a tag, a code pointer, and an array of free variables.

Of course, if the procedure has no free variables, you just have the tag and the code pointer... and because Scheme is semantically squirrely when it comes to the result of (eqv? (lambda () 10) (lambda () 10)) (it's unspecified: lambda expressions don't have identity), we can statically allocate the closure in the binary, as a constant.

Otherwise we do allocate the heap object.

Note however that if a group of mutually recursive procedures has just one entry that is not "well-known", then that procedure clique can share one closure object.

lambda: it's complicated

In summary, a lambda is an abstraction that has many concrete representations. Guile will choose the cheapest representation that it can. If you need to eke out even more performance from your program, having a good mental model of how the abstract maps to the concrete will help you know where to focus your efforts, and what changes might be helpful. Good luck, and happy hacking!

by Andy Wingo at February 08, 2016 10:12 AM

February 05, 2016

Programming Praxis

Freq

Every so often I get it in mind to start solving the daily cryptogram in the newspaper. For instance:

[Category: Literature] V MVXEGC NK V RGIIZH HYZ IGXDK PZQ YNK QLMCGIIV HYGX AYG KQX NK KYNXNXO VXD HZXAK NA MVUE AYG LNXQAG NA MGONXK AZ CVNX. — LVCE AHVNX

The first word is either A or I, and the repeated two-letter words NK, NK, NA and NA suggest words like IF, IS, IT or OF, ON, OR. Further, the digraph NK also appears as YNK. So we guess that NK is IS, YNK is HIS, and V is A. Now the author LVCE AHVNK is _A__ __AI_, which rather strongly suggests MARK TWAIN. Now we have the phrase WH_N TH_ S_N IS SHININ_, which is almost certainly WHEN THE SUN IS SHINING, and the rest of the cryptogram is easy: A BANKER IS A FELLOW WHO LENDS YOU HIS UMBRELLA WHEN THE SUN IS SHINING AND WANTS IT BACK THE MINUTE IT BEGINS TO RAIN. — MARK TWAIN.

We’ve written a program to solve cryptograms, a long time ago, using a dictionary attack, but sometimes it’s still fun to solve them by hand. One way a computer can help is by creating frequency tables of single letters, digraphs, and trigraphs; in the puzzle shown above, a table of digraphs leads quickly to NX, which appears six times, along with NK (three times) and NA (two times), which quickly limits the possibilities for the letter N. We didn’t use it in the analysis above, but AYG is the most common trigram, and both times it appears it is a word by itself, so it is almost certainly THE or AND. Thus, frequency tables quickly lead to a solution.

Your task is to write a program that generates frequency tables, including single characters, digraphs, trigraphs, and other lengths. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.


by programmingpraxis at February 05, 2016 09:00 AM

February 04, 2016

Andy Wingo

guile compiler tasks

Hey! We released Guile 2.1.2, including the unboxing work, and we fixed the slow bootstrap problem by shipping pre-built bootstraps in tarballs. A pretty OK solution in my opinion; check it out!

future work

At this point I think I'm happy with Guile's compiler and VM, enough for now. There is a lot more work to do but it's a good point at which to release a stable series. There will probably be a number of additional pre-releases, but not any more significant compiler/VM work that must be done before a release.

However, I was talking with Guilers at FOSDEM last weekend and we realized that although we do a pretty good job at communicating the haps in compiler-land, we don't do a good job at sharing a roadmap or making it possible for other folks to join the hack. And indeed, it's been difficult to do so while things were changing so much: I had to get things right in my head before joining in the confusion of other people's heads.

In that spirit I'd like to share a list of improvements that it would be nice to make at some point. If you take one of these tasks, be my guest: find me on IRC (wingo on freenode) and let me know, and I'll help as I am able. You need to be somewhat independent; I'm not offering a proper mentoring or anything, more like office hours or something, where you come with the problem you are having and I commiserate and give context/background/advice as I am able.

So with that out of the way, here's a huge list of stuff! Following this, more details on each one.

  1. stripping binaries

  2. full source in binaries

  3. cps in in binaries

  4. linking multiple modules together

  5. linking a single executable

  6. instruction explosion

  7. elisp optimizations

  8. prompt removal

  9. basic register allocation

  10. optimal register allocation

  11. unboxed record fields

  12. textual CPS

  13. avoiding arity checks

  14. unboxed calls and returns

  15. module-level inlining

  16. cross-module inlining

As a bonus, in the end I'll give some notes on native compilation. But first, the hacks!

stripping binaries

Guile uses ELF as its object file format, and currently includes source location information as DWARF data. On space-constrained devices this might be too much. Your task: add a hack to the linker that can strip existing binaries. Read Ian Lance Taylor's linker articles for more background, if you don't know things about linkers yet.

full source in binaries

Wouldn't it be nice if the ELF files that Guile generates actually included the source as well as the line numbers? We could do that, in a separate strippable ELF section. This point is like the reverse of the previous point :)

cps in in binaries

We could also include the CPS IR in ELF files too. This would enable some kinds of link-time optimization and cross-module inlining. You'd need to define a binary format for CPS, like LLVM bitcode or so. Neat stuff :)

linking multiple modules together

Currently in Guile, just about every module is a separate .go file. Loading a module will cause a few stat calls and some seeks and reads and all that. Wouldn't it be nice if you could link together all the .go files that were commonly used into one object? Again this is a linker hack, but it needs support from the run-time as well: when the run-time goes to load a file, it should first check in a registry if that file has been logically provided by some other file. We'd be able to de-duplicate constant data from various modules. However there is an initialization phase when loading a .go file which effectively performs all the relocations needed by constants that need a fix-up at load-time; see the ELF article I linked to above for more. For some uses, it would be OK to produce one relocation/initialization procedure. For others, if you expected to only load a fraction of the modules in a .go file, it would be a lose on startup time,
so you would probably need to support lazy relocation when a module is first loaded.

Anyway, your task would be to write a linker hack that loads a bunch of .go files, finds the relocations in them, de-duplicates the constants, and writes out a combined .go file that includes a table of files contained in it. Good luck :) This hack would work great for Emacs, where it's effectively a form of unexec that doesn't actually rely on unexec.

linking a single executable

In the previous task, you could end up with the small guile binary that links to libguile (or your binary linking to libguile), and then a .go file containing all the modules you are interestd in. It sure would be nice to be able to link those together into just one binary, or at least to link the .go into the Guile binary. If the Guile is statically linked itself, you would have a statically linked application. If it's dynamically linked, it would remain dynamically linked. Again, a linker hack, but one that could provide a nicer way to distribute Guile binaries.

instruction explosion

Now we get more to the compiler side of things. Currently in Guile's VM there are instructions like vector-ref. This is a little silly: there are also instructions to branch on the type of an object (br-if-tc7 in this case), to get the vector's length, and to do a branching integer comparison. Really we should replace vector-ref with a combination of these test-and-branches, with real control flow in the function, and then the actual ref should use some more primitive unchecked memory reference instruction. Optimization could end up hoisting everything but the primitive unchecked memory reference, while preserving safety, which would be a win. But probably in most cases optimization wouldn't manage to do
this, which would be a lose overall because you have more instruction dispatch.

Well, this transformation is something we need for native compilation anyway. I would accept a patch to do this kind of transformation on the master branch, after version 2.2.0 has forked. In theory this would remove most all high level instructions from the VM, making the bytecode closer to a virtual CPU, and likewise making it easier for the compiler to emit native code as it's working at a lower level.

elisp optimizations

Guile implements Emacs Lisp, and does so well. However it hasn't been the focus of a lot of optimization. Emacs has a lot of stuff going on on its side, and so have we, so we haven't managed to replace the Elisp interpreter in Emacs with one written in Guile, though Robin Templeton has brought us a long way forward. We need someone to do both the integration work but also to poke the compiler and make sure it's a clear win.

prompt removal

It's pretty natural to use delimited continuations when compiling some kind of construct that includes a break statement to Guile, whether that compiler is part of Elisp or just implemented as a Scheme macro. But, many instances of prompts can be contified, resulting in no overhead at run-time. Read up on contification and contify the hell out of some prompts!

basic register allocation

Guile usually tries its best to be safe-for-space: only the data which might be used in the future of a program is kept alive, and the rest is available for garbage collection. Notably, this applies to function arguments, temporaries, and lexical variables: if a value is dead, the GC can collect it and re-use its space. However this isn't always what you want. Sometimes you might want to have all variables that are in scope to be available, for better debugging. Your task would be to implement a "slot allocator" (which is really register allocation) that keeps values alive in the parts of the programs that they dominate.

optimal register allocation

On the other hand, our slot allocator -- which is basically register allocation, but for stack slots -- isn't so great. It does OK but you can often end up shuffling values in a loop, which is the worst. Your task would be to implement a proper register allocator: puzzle-solving, graph-coloring, iterative coalescing, something that really tries to do a good job. Good luck!

unboxed record fields

Guile's "structs", on which records are implemented, support unboxed values, but these values are untyped, not really integrated with the record layer, and always boxed in the VM. Your task would be to design a language facility that allows us to declare records with typed fields, and to store unboxed values in those fields, and to cause access to their values to emit boxing/unboxing instructions around them. The optimizer will get rid of those boxing/unboxing instructions if it can. Good luck!

textual CPS

The CPS language is key to all compiler work in Guile, but it doesn't have a nice textual form like LLVM IR does. Design one, and implement a parser and an unparser!

avoiding arity checks

If you know the procedure you are calling, like if it's lexically visible, then if you are calling it with the right number of arguments you can skip past the argument check and instead do a call-label directly into the body. Would be pretty neat!

unboxed calls and returns

Likewise if a function's callers are all known, it might be able to unbox its arguments or return value, if that's a good idea. Tricky! You could start with a type inference pass or so, and maybe that could produce some good debugging feedback too.

module-level inlining

Guile currently doesn't inline anything that's not lexically visible. Unfortunately this restriction extends to top-level definitions in a module: they are treated as mutable and so never inlined/optimized/etc. Probably we need to change the semantics here such that a module can be compiled as a unit, and all values which are never mutated can be assumed to be constant. Probably you also want a knob to turn off this behavior, but really you can always re-compile and re-load a module as a whole if re-loading a function at run-time doesn't work because it was inlined. Anyway. Some semantic work here, but some peval work as well. Be careful!

cross-module inlining

Likewise Guile currently doesn't inline definitions from other modules. However for small functions this really hurts. Guile should probably serialize tree-il for small definitions in .go files, and allow peval to speculatively inline imported definitions. This is related to the previous point and has some semantic implications.

bobobobobobonus! native compilation

Thinking realistically, native compilation is the next step. We have the object file format, cool. We will need the ability to call out from machine code in .go files to run-time functions, so we need to enhance the linker, possibly even with things like PLT/GOT sections to avoid dirtying too many pages. We need to lower the CPS even further, to get closer to some kind of machine model, then go specific, with an assembler for each architecture. The priority in the beginning will be simplicity and minimal complexity; good codegen will come later. This is obviously the most attractive thing but it's also the most tricky, design-wise. I want to do at least part of this, so though you can't have it all, you are welcome to help :)

That's it for now. I'll amend the post with more things as and when I think of them. Comments welcome too, as always. Happy hacking!

by Andy Wingo at February 04, 2016 09:38 PM

February 03, 2016

Guile News

GNU Guile 2.1.2 released (beta)

We are chuffed to announce GNU Guile release 2.1,2, the next pre-release in what will become the 2.2 stable series. This release features unboxed arithmetic and dramatically faster build times, along with a number of small speed and memory improvements.

See the original announcement at https://lists.gnu.org/archive/html/guile-user/2016-02/msg00022.html for full details.

by Andy Wingo at February 03, 2016 04:42 PM

February 02, 2016

Joshua Herman

Using JS on Arduino (Part 1)

Using Javascript to control an arduino seems far fetched but there is this great library call Johnny Five which allows you to control an arduino (or many other boards see http://johnny-five.io/platform-support . In this article we will go over how to set up using your Arduino with javascript. The first step should be loading the Arduino application on your computer and running the blink app.

by Joshua Herman (noreply@blogger.com) at February 02, 2016 03:57 PM

Programming Praxis

Prime Gaps

We studied maximal prime gaps in a recent exercise. In today’s exercise we will look at minimal prime gaps — the smallest gap of a given size. For instance, the minimal prime gap of 2 is the gap from 3 to 5, the minimal prime gap of 4 is the gap from 7 to 11, the minimal prime gap of 6 is the gap from 13 to 19, and so on.

Your task is to write a program that finds minimal prime gaps. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.


by programmingpraxis at February 02, 2016 09:00 AM

January 31, 2016

Joe Marshall

Race results are in

Some people wanted to compare machines, so here is the exact code I used and some sample values I got from running it on my laptop. I'm curious what values other people get.
;;; -*- Mode: scheme; coding:us-ascii -*-

;;; This is code for MIT/GNU Scheme.  We attempt to measure the speed of ASSQ.
;;; The code avoids obvious abstractions because we don't want to
;;; measure the speed of function calling.

;; Inline the standard scheme primitives.
(declare (usual-integrations))

;; We change the size and alignment of the heap by consing a small
;; data structure to this list.
(define *change-heap-size-and-alignment* '())

;;; Creates a test alist that looks like this:
;;; ((13 . "13")
;;;  (23 . "23")
;;;  (25 . "25")
;;;  (18 . "18")
;;;  (0 . "0")
;;;  (19 . "19")
;;;  (5 . "5")
;;;  (4 . "4")
;;;  (6 . "6")
;;;    ...
;;; 
(define (make-test-alist n-elements)
  (let ((alist 
         (fold-left (lambda (alist number)
                      (alist-cons number (number->string number)
                                  alist))
                    '()
                    ;; shuffle the numbers from 1 to n
                    (map car
                         (sort   
                          (map (lambda (n)
                                 (cons n (random 1.0)))
                               (iota n-elements))
                          (lambda (l r) (< (cdr l) (cdr r))))))))
    (gc-flip)
    (set! *change-heap-size-and-alignment* 
          (cons (vector #f) *change-heap-size-and-alignment*))
    alist))

;;; Creates an alist of <size> entries and then measures the time to
;;; perform n-lookups on it.  Specialized to fixnum-only arithmetic.

(define (time-alist size n-lookups)
  (let ((test-alist (make-test-alist size)))
    (show-time
     (lambda ()
       (do ((i   0 (fix:+ i 1))
            (idx 0 (if (fix:> idx size)
                       0
                       (fix:+ idx 1)))
            (answer '() (assq idx test-alist)))
           ((fix:>= i n-lookups) answer))))))
Here's a sample run on my laptop. We make an alist with 10 elements and call ASSQ 100,000,000 times on it, fetching each entry about the same number of times.
1 ]=> (do ((i 0 (+ i 1))) ((not (< i 10))) (time-alist 10 100000000))

;process time: 2260 (2260 RUN + 0 GC); real time: 2259
;process time: 2260 (2260 RUN + 0 GC); real time: 2265
;process time: 2290 (2290 RUN + 0 GC); real time: 2291
;process time: 2250 (2250 RUN + 0 GC); real time: 2247
;process time: 2260 (2260 RUN + 0 GC); real time: 2259
;process time: 2240 (2240 RUN + 0 GC); real time: 2240
;process time: 2240 (2240 RUN + 0 GC); real time: 2243
;process time: 2250 (2250 RUN + 0 GC); real time: 2258
;process time: 2240 (2240 RUN + 0 GC); real time: 2247
;process time: 2250 (2250 RUN + 0 GC); real time: 2250
Process time is reported in milliseconds, so it took about 2.26 seconds to do 100,000,000 million lookups. This divides out to .0000000225 seconds = 22.5 nanoseconds per lookup.

It should be about linear with the size of the list, so we'd expect a 100 element list to take somewhere around 225 nanoseconds.

1 ]=> (do ((i 0 (+ i 1))) ((not (< i 10))) (time-alist 100 100000000))

;process time: 20720 (20720 RUN + 0 GC); real time: 20753
;process time: 20700 (20700 RUN + 0 GC); real time: 20733
;process time: 20640 (20640 RUN + 0 GC); real time: 20671
;process time: 20690 (20690 RUN + 0 GC); real time: 20695
;process time: 20670 (20670 RUN + 0 GC); real time: 20690
;process time: 21010 (21010 RUN + 0 GC); real time: 21026
;process time: 20800 (20800 RUN + 0 GC); real time: 20832
;process time: 20760 (20760 RUN + 0 GC); real time: 20747
;process time: 20710 (20710 RUN + 0 GC); real time: 20702
;process time: 20690 (20690 RUN + 0 GC); real time: 20700
;Value: #t
Testing a hash table:
(define (make-test-hash-table n-entries)
  (alist->hash-table (make-test-alist n-entries)))

;;; Creates a hash-table of <size> entries and then measures the time to
;;; perform n-lookups on it.  Specialized to fixnum-only arithmetic.

(define (time-hash-table size n-lookups)
  (let ((test-hash-table (make-test-hash-table size)))
    (show-time
     (lambda ()
       (do ((i   0 (fix:+ i 1))
            (idx 0 (if (fix:> idx size)
                       0
                       (fix:+ idx 1)))
            (answer '() (hash-table/get test-hash-table idx #f)))
           ((fix:>= i n-lookups) answer))))))
Put 10 elements or a thousand in a hash table, it takes a constant amount of time to look things up:
1 ]=> (do ((i 0 (+ i 1))) ((not (< i 10))) (time-hash-table 10 100000000))

;process time: 8320 (8320 RUN + 0 GC); real time: 8321
;process time: 8300 (8300 RUN + 0 GC); real time: 8304
;process time: 8420 (8420 RUN + 0 GC); real time: 8419
;process time: 8280 (8280 RUN + 0 GC); real time: 8304
;process time: 8380 (8380 RUN + 0 GC); real time: 8387
;process time: 8280 (8280 RUN + 0 GC); real time: 8288
;process time: 8320 (8320 RUN + 0 GC); real time: 8311
;process time: 8330 (8330 RUN + 0 GC); real time: 8327
;process time: 8290 (8290 RUN + 0 GC); real time: 8290
;process time: 8310 (8310 RUN + 0 GC); real time: 8307
;Value: #t

1 ]=> (do ((i 0 (+ i 1))) ((not (< i 10))) (time-hash-table 1000 100000000))

;process time: 8400 (8400 RUN + 0 GC); real time: 8403
;process time: 8550 (8550 RUN + 0 GC); real time: 8553
;process time: 8620 (8620 RUN + 0 GC); real time: 8639
;process time: 8420 (8420 RUN + 0 GC); real time: 8435
;process time: 8400 (8400 RUN + 0 GC); real time: 8425
;process time: 8460 (8460 RUN + 0 GC); real time: 8455
;process time: 8460 (8460 RUN + 0 GC); real time: 8459
;process time: 8480 (8480 RUN + 0 GC); real time: 8486
;process time: 8500 (8500 RUN + 0 GC); real time: 8502
;process time: 8520 (8520 RUN + 0 GC); real time: 8518
;Value: #t

Testing an rb-tree:
(define (make-test-rb-tree n-entries)
  (alist->rb-tree (make-test-alist n-entries) fix:= fix:<))

;;; Creates a rb-tree of <size> entries and then measures the time to
;;; perform n-lookups on it.  Specialized to fixnum-only arithmetic.

(define (time-rb-tree size n-lookups)
  (let ((test-rb-tree (make-test-rb-tree size)))
    (show-time
     (lambda ()
       (do ((i   0 (fix:+ i 1))
            (idx 0 (if (fix:> idx size)
                       0
                       (fix:+ idx 1)))
            (answer '() (rb-tree/lookup test-rb-tree idx #f)))
           ((fix:>= i n-lookups) answer))))))

1 ]=> (do ((i 0 (+ i 1))) ((not (< i 10))) (time-rb-tree 10 100000000))

;process time: 3910 (3910 RUN + 0 GC); real time: 3908
;process time: 3810 (3810 RUN + 0 GC); real time: 3805
;process time: 4090 (4090 RUN + 0 GC); real time: 4090
;process time: 3970 (3970 RUN + 0 GC); real time: 3967
;process time: 4060 (4060 RUN + 0 GC); real time: 4051
;process time: 3980 (3980 RUN + 0 GC); real time: 3979
;process time: 4040 (4040 RUN + 0 GC); real time: 4040
;process time: 4090 (4090 RUN + 0 GC); real time: 4094
;process time: 3810 (3810 RUN + 0 GC); real time: 3810
;process time: 4090 (4090 RUN + 0 GC); real time: 4092
;Value: #t

1 ]=> (do ((i 0 (+ i 1))) ((not (< i 10))) (time-rb-tree 100 100000000))

;process time: 7700 (7700 RUN + 0 GC); real time: 7720
;process time: 7760 (7760 RUN + 0 GC); real time: 7767
;process time: 7700 (7700 RUN + 0 GC); real time: 7710
;process time: 7890 (7890 RUN + 0 GC); real time: 7893
;process time: 7920 (7920 RUN + 0 GC); real time: 7914
;process time: 7650 (7650 RUN + 0 GC); real time: 7646
;process time: 7740 (7740 RUN + 0 GC); real time: 7738
;process time: 7760 (7760 RUN + 0 GC); real time: 7761
;process time: 7670 (7670 RUN + 0 GC); real time: 7671
;process time: 8140 (8140 RUN + 0 GC); real time: 8136
;Value: #t

1 ]=> (do ((i 0 (+ i 1))) ((not (< i 10))) (time-rb-tree 1000 100000000))

;process time: 13130 (13130 RUN + 0 GC); real time: 13124
;process time: 13160 (13160 RUN + 0 GC); real time: 13153
;process time: 13150 (13150 RUN + 0 GC); real time: 13149
;process time: 13140 (13140 RUN + 0 GC); real time: 13140
;process time: 13310 (13310 RUN + 0 GC); real time: 13304
;process time: 13170 (13170 RUN + 0 GC); real time: 13172
;process time: 13140 (13140 RUN + 0 GC); real time: 13167
;process time: 13250 (13250 RUN + 0 GC); real time: 13238
;process time: 13300 (13300 RUN + 0 GC); real time: 13318
;process time: 13420 (13420 RUN + 0 GC); real time: 13416
;Value: #t
And wt-trees while we're at it:
(define (make-test-wt-tree n-elements)
  (alist->wt-tree number-wt-type (make-test-alist n-elements)))

(define (time-wt-tree size n-lookups)
  (let ((test-wt-tree (make-test-wt-tree size)))
    (show-time
     (lambda ()
       (do ((i   0 (fix:+ i 1))
            (idx 0 (if (fix:> idx size)
                       0
                       (fix:+ idx 1)))
            (answer '() (wt-tree/lookup test-wt-tree idx #f)))
           ((fix:>= i n-lookups) answer))))))
1 ]=> (do ((i 0 (+ i 1))) ((not (< i 10))) (time-wt-tree 100 100000000))

;process time: 6400 (6400 RUN + 0 GC); real time: 6397
;process time: 6740 (6740 RUN + 0 GC); real time: 6736
;process time: 6760 (6760 RUN + 0 GC); real time: 6763
;process time: 6070 (6070 RUN + 0 GC); real time: 6068
;process time: 6450 (6450 RUN + 0 GC); real time: 6461
;process time: 6800 (6800 RUN + 0 GC); real time: 6812
;process time: 6330 (6330 RUN + 0 GC); real time: 6346
;process time: 6060 (6060 RUN + 0 GC); real time: 6066
;process time: 6050 (6050 RUN + 0 GC); real time: 6039
;process time: 6300 (6300 RUN + 0 GC); real time: 6303
;Value: #t

by Joe Marshall (noreply@blogger.com) at January 31, 2016 05:28 AM

January 30, 2016

Joe Marshall

Alist vs. hash-table

An alist is a simple data structure that holds key-value pairs in a linked list. When a key is looked up, the list is searched to find it. The time it takes is proportional to the length of the list, or the number of entries.

A hash-table is a more complex data structure that holds key-value pairs in a set of "hash buckets". When a key is looked up, it is first "hashed" to find the correct bucket, then that bucket is searched for the entry. The time it takes depends on a number of things, the hash algorithm, the number of buckets, the number of entries in the bucket, etc. A hash-table can be faster than an alist because the hashing step is quick and the subsequent search step will have very few entries to search.

In theory, an alist takes time proportional to the number of entries, but a hash-table takes constant time independent of the number of entries. Let's find out if this is true for MIT/GNU Scheme.

I wrote a little program that measures how long it takes to look things up in an alist vs. a hash table. Here's what I measured:
It does indeed seem that alists are linear and hash tables are constant in lookup time. But the hashing step of a hash table does take time, so short alists end up being faster that hash tables. The breakeven point looks like a tad over 25 elements. So if you expect 25 or fewer entries, an alist will perform better than a hash table. (Of course different implementations will have different break even points.)

A tree data structure is slightly more complex than an alist, but simpler than a hash table. Looking up an entry in a tree takes time proportional to the logarithm of the number of entries. The logarithm function grows quite slowly, so a tree performs pretty well over a very large range of entries. A tree is slower than an alist until you have about 15 entries. At this point, the linear search of an alist cannot compete with the logarithmic search of a tree. The time it takes to search a tree grows, but quite slowly. It takes more than 100 entries before a tree becomes as slow as a hash table.
With a big enough tree, the growth is so slow that you can pretend it is constant.

by Joe Marshall (noreply@blogger.com) at January 30, 2016 05:08 AM

January 29, 2016

Programming Praxis

Compare Strings With One Error

I don’t know if today’s exercise comes from an interview question or from somebody’s homework, but it’s a good exercise:

Given two strings, determine if they are the same except for exactly one difference. Two identical strings fail to match. Two strings that differ in two or more characters fail to match. Two strings with lengths that differ by one match if and only if they are identical except for one additional character. Indicate the index where the difference occurs, or report failure if the two input strings do not differ in exactly one character.

Your task is to write a program that compares strings with one error. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.


by programmingpraxis at January 29, 2016 04:24 PM

January 28, 2016

Joe Marshall

Latest amusement

Here's a procedure that CDRs down a list until it hits the end:
(define (last-cdr list)
  (let loop ((tail list))
    (if (pair? tail)
        (loop (cdr tail))
        tail)))
The inner loop of the procedure compiles to this:
loop-3:
 (cmp q (r 7) (@r 6))         ;; Interrupt check
 (jge (@pcr interrupt-12))

 (mov q (r 0) (@r 4))         ;; Fetch 'tail'

 (mov q (r 1) (r 0))          ;; Extract the tag
 (shr q (r 1) (&u #x3a))

 (cmp b (r 1) (&u 1))         ;; Check for pair?, jump if not
 (jne (@pcr label-14))

 (and q (r 0) (r 5))          ;; Mask the tag

 (mov q (r 1) (@ro 0 8))      ;; Fetch the CDR

 (mov q (@r 4) (r 1))         ;; Assign 'tail'

 (jmp (@pcr loop-3))          ;; Do it again.
On my laptop, each iteration of this loop takes a few nanoseconds.

I noticed that there seems to be a lot more going on than CDRing down a list. The interrupt check is responsible for a lot of the overhead. MIT/GNU Scheme polls for interrupts. The compiler inserts an interrupt check at every procedure entry and backwards branch. This ensures that only a small, bounded number of instructions can run between interrupt polls. The interrupt poll and the heap check are simultaneous because of a hack. To interrupt Scheme, you save the "Top of Heap" pointer and set it to zero. This causes the heap check to fail as if there were an out of memory condition. The out of memory handler first checks to see if this was actually an interrupt masquerading as an out of memory, and restores the heap pointer if so.

The two instructions,

(cmp q (r 7) (@r 6))         ;; Interrupt check
 (jge (@pcr interrupt-12))
perform the check. The first compares register 7 with the contents of memory that register 6 points at. The convention for the compiler is that register 7 contains the free pointer and register 6 holds the address of MEMTOP. The interrupt check does a memory cycle.

The reason you poll for interrupts is that you need the virtual machine to be in a coherent state because the interrupt could attempt to run arbitrary code. The garbage collector in particular has to be able to parse the machine state
at an interrupt. Interrupts are polled at apply time. Before application, the interrupts are checked. If an interrupt is to be processed, a continuation is pushed that will do the original application, and an interrupt handler is applied instead.

In MIT-Scheme, the virtual machine is a stack machine, and at apply time the entire state of the virtual machine is pointed to by the stack pointer. This is a good state to be in when you handle interrupts or garbage collect. But this means that arguments are passed on the stack. This instruction:

(mov q (r 0) (@r 4))
loads register 0 with the contents of memory at register 4. (The compiler convention is that register 4 is the stack pointer.) In other words, it is fetching the value of the argument tail from the stack. This is the second memory cycle.

Between interrupt polls, the compiled code can freely manipulate Scheme objects without worrying about being embarrassed by the garbage collector. But at apply time, when a possible interrupt poll could happen, the compiled code must put the virtual machine back into a coherent state. In particular, modified values are stored back on the stack.I This instruction just before the jump puts the value of tail back on the stack before jumping to the top of the loop.

(mov q (@r 4) (r 1)) 
That's three memory cycles in addition to the actual fetching of the CDR in this instruction:
(mov q (r 1) (@ro 0 8))
Of course these are quite inexpensive memory cycles because the top of stack is cached, but there is at least the time time it takes to validate the cache entry.

The interrupt poll occurs every time around the loop, so we copy the arguments back and forth between the stack and registers on each iteration. If we unroll the loop we can avoid some of the copying:

(define (last-cdr list)
  (let loop ((tail list))
    (if (pair? tail)
        (loop (cdr tail))
        tail)))

(define (last-cdr2 list)
  (let loop ((tail list))
    (if (not (pair? tail))
        tail
        (let ((tail (cdr tail)))
          (if (not (pair? tail))
              tail
              (loop (cdr tail)))))))
The loop in last-cdr2 compiles to this:
loop-6:
 (cmp q (r 7) (@r 6))           ;; Interrupt check
 (jge (@pcr interrupt-15))  

 (mov q (r 0) (@r 4))           ;; Fetch 'tail'

 (mov q (r 1) (r 0))            ;; Extract the tag
 (shr q (r 1) (&u #x3a))

 (cmp b (r 1) (&u 1))           ;; Check for pair?
 (jne (@pcr label-17))

 (and q (r 0) (r 5))            ;; Mask the tag

 (mov q (r 1) (@ro 0 8))        ;; Fetch the CDR

 (mov q (@r 4) (r 1))           ;; Assign 'tail'

 (mov q (r 0) (r 1))
 (shr q (r 0) (&u #x3a))        ;; Extract the tag

 (cmp b (r 0) (&u 1))           ;; Check for pair?
 (jne (@pcr label-19))

 (and q (r 1) (r 5))            ;; Mask the tag

 (mov q (r 0) (@ro 1 8))        ;; Fetch the CDR

 (mov q (@r 4) (r 0))           ;; Assign 'tail'

 (jmp (@pcr loop-6))            ;; Do it again
If I count correctly, there are six memory cycles per iteration, two of which are CDRs. We also avoid a single jmp instruction per iteration.

Here's the timing on my machine:

(test-last-cdr)
3.643 nanoseconds per cdr
3.643 nanoseconds per cdr
3.711 nanoseconds per cdr
3.643 nanoseconds per cdr
3.576 nanoseconds per cdr

(test-last-cdr2)
2.766 nanoseconds per cdr
2.699 nanoseconds per cdr
2.834 nanoseconds per cdr
2.699 nanoseconds per cdr


by Joe Marshall (noreply@blogger.com) at January 28, 2016 05:55 AM

January 26, 2016

Programming Praxis

Higher-Order String Functions

Scheme provides the higher-order functions map, for-each and fold that operate on lists. In today’s exercise we will write versions of those functions that operate on strings:

(string-map proc str) applies proc to each character in str, replacing the character with the output of proc. proc takes a single character and returns a single character.

(string-for-each proc str) applies proc to each character in str; the function is executed only for its side-effects. proc takes a single character and returns nothing.

(string-fold proc base str) calls function (proc base c) on each character of str, working left to right, accumulating the result in base as it goes. proc takes a base of any type, and a character, and returns a value of the same type as base. (string-fold-right proc base str) is the same, except that it works from right to left.

Your task is to implement these four functions in the language of your choice. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.


by programmingpraxis at January 26, 2016 09:00 AM

January 25, 2016

Joe Marshall

Locality and GC

I was curious about how long it took to CDR down a list. I wrote a little program that adds up the elements in a list
and tried it out.

(define test-list (iota (expt 10 6)))

(define (sum-list list)
  (let loop ((total 0)
	     (tail  list))
    (if (pair? tail)
	(loop (+ total (car tail)) (cdr tail))
	(begin (if (not (null? tail))
		   (error:not-list list 'sum-list))
	       total))))

(define (test-sum-list n)
   (do ((i 0 (+ i 1))
        (answer '() (sum-list test-list)))
       ((not (< i n)) answer)))

1 ]=> (show-time (lambda () (test-sum-list 1000)))

;process time: 4280 (4280 RUN + 0 GC); real time: 4281
;Value: 499999500000
But after garbage collection, the performance dropped a bit.
1 ]=> (gc-flip)

;GC #18 15:18:59: took:   0.13   (0%) CPU,   0.13   (0%) real; free: 246114893
;Value: 246114893

1 ]=> (show-time (lambda () (test-sum-list 1000)))

;process time: 4490 (4490 RUN + 0 GC); real time: 4483
;Value: 499999500000
Now it's taking 4.49 ms.

The garbage collector in MIT Scheme is a very simple, stop the world, copying collector. The GC algorithm traverses the heap in breadth-first order. An undesired effect of this is data structures being spread about the heap. When I first built the test-list, the cons cells were close together. Here are the first few addresses:
1 ]=> (do ((tail test-list (cdr tail))
                (i 0 (+ i 1)))
               ((not (< i 10)) #f)
             (display (object-datum tail))
             (newline))
253936080
253936064
253936048
253936032
253936016
253936000
253935984
253935968
253935952
253935936
;Value: #f
But after garbage collection...
1 ]=> (gc-flip)

;GC #19 15:33:28: took:   0.15   (1%) CPU,   0.16   (0%) real; free: 246114593
;Value: 246114593

1 ]=> (do ((tail test-list (cdr tail))
              (i 0 (+ i 1)))
             ((not (< i 10)) #f)
           (display (object-datum tail))
           (newline))
194190168
194326376
194416512
194499936
194555336
194584992
194609544
194631760
194652360
194669208
;Value: #f
All the cons cells are on different pages.

This is annoying, so I hacked the GC so that it transports chains of cons-cells eagerly. If we do the same experiment, before the flip,

1 ]=> (show-time (lambda () (test-sum-list 1000)))

;process time: 4280 (4280 RUN + 0 GC); real time: 4280
;Value: 499999500000
And after.
1 ]=> (gc-flip)

;GC #5 15:38:53: took:   0.23   (2%) CPU,   0.24   (0%) real; free: 249429521
;Value: 249429521

1 ]=> (show-time (lambda () (test-sum-list 1000)))

;process time: 4100 (4100 RUN + 0 GC); real time: 4100
;Value: 499999500000
Now we see a performance increase. If we look at the test list,
1 ]=> (do ((tail test-list (cdr tail))
           (i 0 (+ i 1)))
          ((not (< i 10)) #f)
        (display (object-datum tail))
        (newline))
193516792
193516808
193516824
193516840
193516856
193516872
193516888
193516904
193516920
193516936
;Value: #f

You can see that the cons-cells addresses are next to each other.

by Joe Marshall (noreply@blogger.com) at January 25, 2016 01:17 AM

January 22, 2016

GNU Guix

Meet Guix at FOSDEM!

One week to FOSDEM! This year, there will be no less than six Guix-related talks. This and the fact that we are addressing different communities is exciting.

First, on Saturday morning, in the GNU Guile track (room K.3.201):

On Saturday afternoon:

On Sunday noon:

See you there!

About GNU Guix

GNU Guix is a functional package manager for the GNU system. The Guix System Distribution or GuixSD is an advanced distribution of the GNU system that relies on GNU Guix and respects the user's freedom.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. Guix uses low-level mechanisms from the Nix package manager, except that packages are defined as native Guile modules, using extensions to the Scheme language. GuixSD offers a declarative approach to operating system configuration management, and is highly customizable and hackable.

GuixSD can be used on an i686 or x86_64 machine. It is also possible to use Guix on top of an already installed GNU/Linux system, including on mips64el and armv7.

by Ludovic Courtès at January 22, 2016 09:33 AM