Planet Scheme

May 22, 2018

Programming Praxis

Floyd’s Triangle

I noticed the other day this list of the Top 75 Programming Interview Questions. I’m not sure of the origin of the list, and it does have a Java feel to it, but I was happy to see that we’ve covered almost all the questions on the list. One of the questions we missed is Number 57:

Write a program to print Floyd’s Triangle.

I wasn’t familiar with Floyd’s Triangle, so I had to peek at the solution. Floyd’s Triangle lists the numbers in order, in lines of increasing length, so a Floyd’s Triangle of size 5 looks like this:

1
2 3
4 5 6
7 8 9 10
11 12 13 14 15

Your task is to write two programs that print Floyd’s Triangle; you must write two different programs that use two fundamentally different methods. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at May 22, 2018 09:00 AM

May 18, 2018

Programming Praxis

Billing Period

I’m not sure of the origin of today’s exercise, but given the contrived nature of the calculation, I suspect it’s a programming exercise for beginning programming students:

Our merchants receive “weekly” invoices, following these rules:

  • Each Saturday marks the beginning of a new billing period.
  • Each 1st of a month marks the begining of a new billing period.
  • Within a year, billing periods are numbered consecutively, starting with billing period 1 on January 1st.

Thus, a billing period can be referenced by a year and period number.

Your task is to write a program that calculates the billing period for a given date. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at May 18, 2018 09:00 AM

May 16, 2018

Andy Wingo

lightweight concurrency in lua

Hello, all! Today I'd like to share some work I have done recently as part of the Snabb user-space networking toolkit. Snabb is mainly about high-performance packet processing, but it also needs to communicate with management-oriented parts of network infrastructure. These communication needs are performed by a dedicated manager process, but that process has many things to do, and can't afford to make blocking operations.

Snabb is written in Lua, which doesn't have built-in facilities for concurrency. What we'd like is to have fibers. Fortunately, Lua's coroutines are powerful enough to implement fibers. Let's do that!

fibers in lua

First we need a scheduling facility. Here's the smallest possible scheduler: simply a queue of tasks and a function to run those tasks.

local task_queue = {}

function schedule_task(thunk)
   table.insert(task_queue, thunk)
end

function run_tasks()
   local queue = task_queue
   task_queue = {}
   for _,thunk in ipairs(queue) do thunk() end
end

For our purposes, a task is just a function that will be called with no arguments.

Now let's build fibers. This is easier than you might think!

local current_fiber = false

function spawn_fiber(fn)
   local fiber = coroutine.create(fn)
   schedule_task(function () resume_fiber(fiber) end)
end

function resume_fiber(fiber, ...)
   current_fiber = fiber
   local ok, err = coroutine.resume(fiber, ...)
   current_fiber = nil
   if not ok then
      print('Error while running fiber: '..tostring(err))
   end
end

function suspend_current_fiber(block, ...)
   -- The block function should arrange to reschedule
   -- the fiber when it becomes runnable.
   block(current_fiber, ...)
   return coroutine.yield()
end

Here, a fiber is simply a coroutine underneath. Suspending a fiber suspends the coroutine. Resuming a fiber runs the coroutine. If you're unfamiliar with coroutines, or coroutines in Lua, maybe have a look at the lua-users wiki page on the topic.

The difference between a fibers facility and just coroutines is that with fibers, you have a scheduler as well. Very much like Scheme's call-with-prompt, coroutines are one of those powerful language building blocks that should rarely be used directly; concurrent programming needs more structure than what Lua offers.

If you're following along, it's probably worth it here to think how you would implement yield based on these functions. A yield implementation should yield control to the scheduler, and resume the fiber on the next scheduler turn. The answer is here.

communication

Once you have fibers and a scheduler, you have concurrency, which means that if you're not careful, you have a mess. Here I think the Go language got the essence of the idea exactly right: Do not communicate by sharing memory; instead, share memory by communicating.

Even though Lua doesn't support multiple machine threads running concurrently, concurrency between fibers can still be fraught with bugs. Tony Hoare's Communicating Sequential Processes showed that we can avoid a class of these bugs by treating communication as a first-class concept.

Happily, the Concurrent ML project showed that it's possible to build these first-class communication facilities as a library, provided the language you are working in has threads of some kind, and fibers are enough. Last year I built a Concurrent ML library for Guile Scheme, and when in Snabb we had a similar need, I ported that code over to Lua. As it's a new take on the problem in a different language, I think I've been able to simplify things even more.

So let's take a crack at implementing Concurrent ML in Lua. In CML, the fundamental primitive for communication is the operation. An operation represents the potential for communication. For example, if you have a channel, it would have methods to return "get operations" and "put operations" on that channel. Actually receiving or sending a message on a channel occurs by performing those operations. One operation can be performed many times, or not at all.

Compared to a system like Go, for example, there are two main advantages of CML. The first is that CML allows non-deterministic choice between a number of potential operations in a generic way. For example, you can construct a operation that, when performed, will either get on one channel or wait for a condition variable to be signalled, whichever comes first. In Go, you can only select between operations on channels.

The other interesting part of CML is that operations are built from a uniform protocol, and so users can implement new kinds of operations. Compare again to Go where all you have are channels, and nothing else.

The CML operation protocol consists three related functions: try which attempts to directly complete an operation in a non-blocking way; block, which is called after a fiber has suspended, and which arranges to resume the fiber when the operation completes; and wrap, which is called on the result of a successfully performed operation.

In Lua, we can call this an implementation of an operation, and create it like this:

function new_op_impl(try, block, wrap)
   return { try=try, block=block, wrap=wrap }
end

Now let's go ahead and write the guts of CML: the operation implementation. We'll represent an operation as a Lua object with two methods. The perform method will attempt to perform the operation, and return the resulting value. If the operation can complete immediately, the call to perform will return directly. Otherwise, perform will suspend the current fiber and arrange to continue only when the operation completes.

The wrap method "decorates" an operation, returning a new operation that, if and when it completes, will "wrap" the result of the completed operation with a function, by applying the function to the result. It's useful to distinguish the sub-operations of a non-deterministic choice from each other.

Here our new_op function will take an array of operation implementations and return an operation that, when performed, will synchronize on the first available operation. As you can see, it already has the equivalent of Go's select built in.

function new_op(impls)
   local op = { impls=impls }
   
   function op.perform()
      for _,impl in ipairs(impls) do
         local success, val = impl.try()
         if success then return impl.wrap(val) end
      end
      local function block(fiber)
         local suspension = new_suspension(fiber)
         for _,impl in ipairs(impls) do
            impl.block(suspension, impl.wrap)
         end
      end
      local wrap, val = suspend_current_fiber(block)
      return wrap(val)
   end

   function op.wrap(f)
      local wrapped = {}
      for _, impl in ipairs(impls) do
         local function wrap(val)
            return f(impl.wrap(val))
         end
         local impl = new_op_impl(impl.try, impl.block, wrap)
         table.insert(wrapped, impl)
      end
      return new_op(wrapped)
   end

   return op
end

There's only one thing missing there, which is new_suspension. When you go to suspend a fiber because none of the operations that it's trying to do can complete directly (i.e. all of the try functions of its impls returned false), at that point the corresponding block functions will publish the fact that the fiber is waiting. However the fiber only waits until the first operation is ready; subsequent operations becoming ready should be ignored. The suspension is the object that manages this state.

function new_suspension(fiber)
   local waiting = true
   local suspension = {}
   function suspension.waiting() return waiting end
   function suspension.complete(wrap, val)
      assert(waiting)
      waiting = false
      local function resume()
         resume_fiber(fiber, wrap, val)
      end
      schedule_task(resume)
   end
   return suspension
end

As you can see, the suspension's complete method is also the bit that actually arranges to resume a suspended fiber.

Finally, just to round out the implementation, here's a function implementing non-deterministic choice from among a number of sub-operations:

function choice(...)
   local impls = {}
   for _, op in ipairs({...}) do
      for _, impl in ipairs(op.impls) do
         table.insert(impls, impl)
      end
   end
   return new_op(impls)
end

on cml

OK, I'm sure this seems a bit abstract at this point. Let's implement something concrete in terms of these primitives: channels.

Channels expose two similar but different kinds of operations: put operations, which try to send a value, and get operations, which try to receive a value. If there's a sender already waiting to send when we go to perform a get_op, the operation continues directly, and we resume the sender; otherwise the receiver publishes its suspension to a queue. The put_op case is similar.

Finally we add some synchronous put and get convenience methods, in terms of their corresponding CML operations.

function new_channel()
   local ch = {}
   -- Queues of suspended fibers waiting to get or put values
   -- via this channel.
   local getq, putq = {}, {}

   local function default_wrap(val) return val end
   local function is_empty(q) return #q == 0 end
   local function peek_front(q) return q[1] end
   local function pop_front(q) return table.remove(q, 1) end
   local function push_back(q, x) q[#q+1] = x end

   -- Since a suspension could complete in multiple ways
   -- because of non-deterministic choice, it could be that
   -- suspensions on a channel's putq or getq are already
   -- completed.  This helper removes already-completed
   -- suspensions.
   local function remove_stale_entries(q)
      local i = 1
      while i <= #q do
         if q[i].suspension.waiting() then
            i = i + 1
         else
            table.remove(q, i)
         end
      end
   end

   -- Make an operation that if and when it completes will
   -- rendezvous with a receiver fiber to send VAL over the
   -- channel.  Result of performing operation is nil.
   function ch.put_op(val)
      local function try()
         remove_stale_entries(getq)
         if is_empty(getq) then
            return false, nil
         else
            local remote = pop_front(getq)
            remote.suspension.complete(remote.wrap, val)
            return true, nil
         end
      end
      local function block(suspension, wrap)
         remove_stale_entries(putq)
         push_back(putq, {suspension=suspension, wrap=wrap, val=val})
      end
      return new_op({new_op_impl(try, block, default_wrap)})
   end

   -- Make an operation that if and when it completes will
   -- rendezvous with a sender fiber to receive one value from
   -- the channel.  Result is the value received.
   function ch.get_op()
      local function try()
         remove_stale_entries(putq)
         if is_empty(putq) then
            return false, nil
         else
            local remote = pop_front(putq)
            remote.suspension.complete(remote.wrap)
            return true, remote.val
         end
      end
      local function block(suspension, wrap)
         remove_stale_entries(getq)
         push_back(getq, {suspension=suspension, wrap=wrap})
      end
      return new_op({new_op_impl(try, block, default_wrap)})
   end

   function ch.put(val) return ch.put_op(val).perform() end
   function ch.get()    return ch.get_op().perform()    end

   return ch
end

a wee example

You might be wondering what it's like to program with channels in Lua, so here's a little example that shows a prime sieve based on channels. It's not a great example of concurrency in that it's not an inherently concurrent problem, but it's cute to show computations in terms of infinite streams.

function prime_sieve(count)
   local function sieve(p, rx)
      local tx = new_channel()
      spawn_fiber(function ()
         while true do
            local n = rx.get()
            if n % p ~= 0 then tx.put(n) end
         end
      end)
      return tx
   end

   local function integers_from(n)
      local tx = new_channel()
      spawn_fiber(function ()
         while true do
            tx.put(n)
            n = n + 1
         end
      end)
      return tx
   end

   local function primes()
      local tx = new_channel()
      spawn_fiber(function ()
         local rx = integers_from(2)
         while true do
            local p = rx.get()
            tx.put(p)
            rx = sieve(p, rx)
         end
      end)
      return tx
   end

   local done = false
   spawn_fiber(function()
      local rx = primes()
      for i=1,count do print(rx.get()) end
      done = true
   end)

   while not done do run_tasks() end
end

Here you also see an example of running the scheduler in the last line.

where next?

Let's put this into perspective: in a couple hundred lines of code, we've gone from minimal Lua to a language with lightweight multitasking, extensible CML-based operations, and CSP-style channels; truly a delight.

There are a number of possible ways to extend this code. One of them is to implement true multithreading, if the language you are working in supports that. In that case there are some small protocol modifications to take into account; see the notes on the Guile CML implementation and especially the Manticore Parallel CML project.

The implementation above is pleasantly small, but it could be faster with the choice of more specialized data structures. I think interested readers probably see a number of opportunities there.

In a library, you might want to avoid the global task_queue and implement nested or multiple independent schedulers, and of course in a parallel situation you'll want core-local schedulers as well.

The implementation above has no notion of time. What we did in the Snabb implementation of fibers was to implement a timer wheel, inspired by Juho Snellman's Ratas, and then add that timer wheel as a task source to Snabb's scheduler. In Snabb, every time the equivalent of run_tasks() is called, a scheduler asks its sources to schedule additional tasks. The timer wheel implementation schedules expired timers. It's straightforward to build CML timeout operations in terms of timers.

Additionally, your system probably has other external sources of communication, such as sockets. The trick to integrating sockets into fibers is to suspend the current fiber whenever an operation on a file descriptor would block, and arrange to resume it when the operation can proceed. Here's the implementation in Snabb.

The only difficult bit with getting nice nonblocking socket support is that you need to be able to suspend the calling thread when you see the EWOULDBLOCK condition, and for coroutines that is often only possible if you implemented the buffered I/O yourself. In Snabb that's what we did: we implemented a compatible replacement for Lua's built-in streams, in Lua. That lets us handle EWOULDBLOCK conditions in a flexible manner. Integrating epoll as a task source also lets us sleep when there are no runnable tasks.

Likewise in the Snabb context, we are also working on a TCP implementation. In that case you want to structure TCP endpoints as fibers, and arrange to suspend and resume them as appropriate, while also allowing timeouts. I think the scheduler and CML patterns are going to allow us to do that without much trouble. (Of course, the TCP implementation will give us lots of trouble!)

Additionally your system might want to communicate with fibers from other threads. It's entirely possible to implement CML on top of pthreads, and it's entirely possible as well to support communication between pthreads and fibers. If this is interesting to you, see Guile's implementation.

When I talked about fibers in an earlier article, I built them in terms of delimited continuations. Delimited continuations are fun and more expressive than coroutines, but it turns out that for fibers, all you need is the expressive power of coroutines -- multi-shot continuations aren't useful. Also I think the presentation might be more straightforward. So if all your language has is coroutines, that's still good enough.

There are many more kinds of standard CML operations; implementing those is also another next step. In particular, I have found semaphores and condition variables to be quite useful. Also, standard CML supports "guards", invoked when an operation is performed, and "nacks", invoked when an operation is definitively not performed because a choice selected some other operation. These can be layered on top; see the Parallel CML paper for notes on "primitive CML".

Also, the choice operator above is left-biased: it will prefer earlier impls over later ones. You might want to not always start with the first impl in the list.

The scheduler shown above is the simplest thing I could come up with. You may want to experiment with other scheduling algorithms, e.g. capability-based scheduling, or kill-safe abstractions. Do it!

Or, it could be you already have a scheduler, like some kind of main loop that's already there. Cool, you can use it directly -- all that fibers needs is some way to schedule functions to run.

godspeed

In summary, I think Concurrent ML should be better-known. Its simplicity and expressivity make it a valuable part of any concurrent system. Already in Snabb it helped us solve some longstanding gnarly issues by making the right solutions expressible.

As Adam Solove says, Concurrent ML is great, but it has a branding problem. Its ideas haven't penetrated the industrial concurrent programming world to the extent that they should. This article is another attempt to try to get the word out. Thanks to Adam for the observation that CML is really a protocol; I'm sure the concepts could be made even more clear, but at least this is a step forward.

All the code in this article is up on a gitlab snippet along with instructions for running the example program from the command line. Give it a go, and happy hacking with CML!

by Andy Wingo at May 16, 2018 03:17 PM

GNU Guix

Tarballs, the ultimate container image format

A year ago we introduced guix pack, a tool that allows you to create “application bundles” from a set of Guix package definitions. On your Guix machine, you run:

guix pack -S /opt/gnu/bin=bin guile gnutls guile-json

and you get a tarball containing your favorite programming language implementation and a couple of libraries, where /opt/gnu/bin is a symlink to the bin directory containing, in this case, the guile command. Add -f docker and, instead of a tarball, you get an image in the Docker format that you can pass to docker load on any machine where Docker is installed. Overall that’s a relatively easy way to share software stacks with machines that do not run Guix.

The tarball format is plain and simple, it’s the one we know and love, and it’s been there “forever” as its name suggests. The tarball that guix pack produces can be readily extracted on another machine, one that doesn’t run Guix, and you’re done. The problem though, is that you’ll need to either unpack the tarball in the root file system or to play tricks with the unshare command, as we saw in the previous post. Why can’t we just extract such a tarball in our home directory and directly run ./opt/gnu/bin/guile for instance?

Relocatable packages

The main issue is that, except in the uncommon case where developers went to great lengths to make it possible (as with GUB, see the *-reloc*.patch files), packages built for GNU/Linux are not relocatable. ELF files embed things like the absolute file name of the dynamic linker, directories where libraries are to be search for (they can be relative file names with $ORIGIN but usually aren’t), and so on; furthermore, it’s very common to embed things like the name of the directory that contains locale data or other application-specific data. For Guix-built software, all these are absolute file names under /gnu/store so Guix-built binaries won’t run unless those /gnu/store files exist.

On machines where support for “user namespaces” is enabled, we can easily “map” the directory where users unpacked the tarball that guix pack produced to /gnu/store, as shown in the previous post:

$ tar xf /path/to/pack.tar.gz
$ unshare -mrf chroot . /opt/gnu/bin/guile --version
guile (GNU Guile) 2.2.0

It does the job but remains quite tedious. Can’t we automate that?

guix pack --relocatable

The --relocatable (or -R) option of guix pack, which landed a few days ago, produces tarballs with automatically relocatable binaries. Back to our earlier example, let’s say you produce a tarball with this new option:

guix pack --relocatable -S /bin=bin -S /etc=etc guile gnutls guile-json

You can send the resulting tarball to any machine that runs the kernel Linux (it doesn’t even have to be GNU/Linux) with user namespace support—which, unfortunately, is disabled by default on some distros. There, as a regular user, you can run:

$ tar xf /path/to/pack.tar.gz
$ source ./etc/profile    # define ’GUILE_LOAD_PATH’, etc.
$ ./bin/guile
guile: warning: failed to install locale
GNU Guile 2.2.3
Copyright (C) 1995-2017 Free Software Foundation, Inc.

Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.

Enter `,help' for help.
scheme@(guile-user)> ,use(json)
scheme@(guile-user)> ,use(gnutls)

We were able to run Guile and to use our Guile libraries since sourcing ./etc/profile augmented the GUILE_LOAD_PATH environment variable that tells Guile where to look for libraries. Indeed we can see it by inspecting the value of %load-path at the Guile prompt:

scheme@(guile-user)> %load-path
$1 = ("/gnu/store/w9xd291967cvmdp3m0s7739icjzgs8ns-profile/share/guile/site/2.2" "/gnu/store/b90y3swxlx3vw2yyacs8cz59b8cbpbw5-guile-2.2.3/share/guile/2.2" "/gnu/store/b90y3swxlx3vw2yyacs8cz59b8cbpbw5-guile-2.2.3/share/guile/site/2.2" "/gnu/store/b90y3swxlx3vw2yyacs8cz59b8cbpbw5-guile-2.2.3/share/guile/site" "/gnu/store/b90y3swxlx3vw2yyacs8cz59b8cbpbw5-guile-2.2.3/share/guile")

Wait, it’s all /gnu/store! As it turns out, guix pack --relocatable created a wrapper around guile that populates /gnu/store in the mount namespace of the process. Even though /gnu/store does not exist on that machine, our guile process “sees” our packages under /gnu/store:

scheme@(guile-user)> ,use(ice-9 ftw)
scheme@(guile-user)> (scandir "/gnu/store")
$2 = ("." ".." "0249nw8c7z626fw1fayacm160fpd543k-guile-json-0.6.0R" "05dvazr5wfh7lxx4zi54zfqnx6ha8vxr-bash-static-4.4.12" "0jawbsyafm93nxf4rcmkf1rsk7z03qfa-libltdl-2.4.6" "0z1r7ai6syi2qnf5z8w8n25b1yv8gdr4-info-dir" "1n59wjm6dbvc38b320iiwrxra3dg7yv8-libunistring-0.9.8" "2fg01r58vv9w41kw6drl1wnvqg7rkv9d-libtasn1-4.12" "2ifmksc425qcysl5rkxkbv6yrgc1w9cs-gcc-5.5.0-lib" "2vxvd3vls7c8i9ngs881dy1p5brc7p85-gmp-6.1.2" "4sqaib7c2dfjv62ivrg9b8wa7bh226la-glibc-2.26.105-g0890d5379c" "5kih0kxmipzjw10c53hhckfzkcs7c8mm-gnutls-3.5.13" "8hxm8am4ll05sa8wlwgdq2lj4ddag464-zlib-1.2.11" "90vz0r78bww7dxhpa7vsiynr1rcqhyh4-nettle-3.4" "b90y3swxlx3vw2yyacs8cz59b8cbpbw5-guile-2.2.3" "c4jrwbv7qckvnqa7f3h7bd1hh8rbg72y-libgc-7.6.0" "f5lw5w4nxs6p5gq0c2nb3jsrxc6mmxbi-libgc-7.6.0" "hjxic0k4as384vn2qp0l964isfkb0blb-guile-json-0.6.0" "ksyja5lbwy0mpskvn4rfi5klc00c092d-libidn2-2.0.4" "l15mx9lrwdflyvmb4a05va05v5yqizg5-libffi-3.2.1" "mm0zclrzj3y7rj74hzyd0f224xly04fh-bash-minimal-4.4.12" "vgmln3b639r68vvy75xhcbi7d2w31mx1-pkg-config-0.29.2" "vz3zfmphvv4w4y7nffwr4jkk7k4s0rfs-guile-2.2.3" "w9xd291967cvmdp3m0s7739icjzgs8ns-profile" "x0jf9ckd30k3nhs6bbhkrxsjmqz8phqd-nettle-3.4" "x8z6cr7jggs8vbyh0xzfmxbid63z6y83-guile-2.2.3R" "xbkl3nx0fqgpw2ba8jsjy0bk3nw4q3i4-gnutls-3.5.13R" "xh4k91vl0i8nlyrmvsh01x0mz629w5a9-gmp-6.1.2" "yx12x8v4ny9f6fipk8285jgfzqavii83-manual-database" "zksh1n0p9x903kqbvswgwy2vsk2b7255-libatomic-ops-7.4.8")

The wrapper is a small statically-linked C program. (Scheme would be nice and would allow us to reuse call-with-container, but it would also take up more space.) All it does is create a child process with separate mount and user namespaces, which in turn mounts the tarball’s /gnu/store to /gnu/store, bind-mounts other entries from the host root file system, and chroots into that. The result is a binary that sees everything a “normal” program sees on the host, but with the addition of /gnu/store, with minimal startup overhead.

In a way, it’s a bit of a hack: for example, what gets bind-mounted in the mount namespace of the wrapped program is hard-coded, which is OK, but some flexibility would be welcome (things like Flatpak’s sandbox permissions, for instance). Still, that it Just Works is a pretty cool feature.

Tarballs vs. Snap, Flatpak, Docker, & co.

Come to think of it: if you’re a developer, guix pack is probably one of the easiest ways to create an “application bundle” to share with your users; and as a user, these relocatable tarballs are about the simplest thing you can deal with since you don’t need anything but tar—well, and user namespace support. Plus, since they are bit-reproducible, anyone can rebuild them to ensure they do not contain malware or to check the provenance and licensing of its contents.

Application bundles cannot replace full-blown package management, which allows users to upgrade, get security updates, use storage and memory efficiently, and so on. For the purposes of quickly sharing packages with users or with Guix-less machines, though, you might find Guix packs to be more convenient than Snap, Flatplak, or Docker. Give it a spin and let us know!

About GNU Guix

GNU Guix is a transactional package manager for the GNU system. The Guix System Distribution or GuixSD is an advanced distribution of the GNU system that relies on GNU Guix and respects the user's freedom.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. Guix uses low-level mechanisms from the Nix package manager, except that packages are defined as native Guile modules, using extensions to the Scheme language. GuixSD offers a declarative approach to operating system configuration management, and is highly customizable and hackable.

GuixSD can be used on an i686, x86_64 and armv7 machines. It is also possible to use Guix on top of an already installed GNU/Linux system, including on mips64el and aarch64.

by Ludovic Courtès at May 16, 2018 09:00 AM

May 15, 2018

Programming Praxis

Sum Embedded Numbers

It’s mid-May, so at most universities the semester has ended and it is safe for me to solve some of the exercises that students sent to me over the last few months. Here’s a fun little one:

Given a string containing embedded numbers, find the sum of the embedded numbers. For instance, given the string “11aa22bb33cc44”, the desired sum is 11 + 22 + 33 + 44 = 110. You may not use regular expressions.

Although the problem statement doesn’t say so, you may assume that the numbers of interest are non-negative integers. Thus, the purpose of the exercise is for students to iterate through a string, identify the digits in the string, and manipulate them numerically.

Your task is to write a program that sums the numbers embedded in a string. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at May 15, 2018 09:00 AM

May 11, 2018

Programming Praxis

Help Wanted: Report Generator

Regular readers of this blog know that in my day job I am a programmer charged with maintaining an enterprise-wide database system, running on HP-UX (we’re switching to Linux next year) and Oracle 12g. We are in the process of replacing our existing report-writing software with a new, user-friendly product. Personally, I think it’s crazy to train non-technical people how to join tables and produce aggregates (the very first example in the training manual tripled my age to 183 because a one-to-many join was done incorrectly), but the users are excited because they will be able to get their data themselves instead of relying on the IT department to produce reports. I won’t tell you which reporting software we bought (you would recognize the name) or how much it cost (you will cry).

Some of the reports that we produce (probably a quarter to a third of them) can’t be done using the new software because the necessary fields aren’t in the data warehouse. The IT department (I am one of six programmers) will have to rewrite those reports using some combination of Oracle SQL*Plus and PL/SQL; the result will be plain-ascii reports delivered as LIS files in our standard enterprise software. Yes, 1960s-era reports, 132 columns by 66 rows per page, printed on a modern laser printer instead of green-bar fan-fold paper — that’s called progress where I work.

I’ve been looking at report-writing software. SQL*Plus actually isn’t bad — it must have been written in the 1960s for 1960s-era reports — but I still have to count the print columns to specify the linesize. SQL*Plus provides report headers and footers, page headers and footers that can include values drawn from the data, and control-breaks with record counts and totals. The one feature that is missing is proper headers and footers on each control-break; the field that caused the break must be included in the body of the report (on each detail line) rather than on a break header (so it takes horizontal space on the report, which may be in short supply), and there is no provision to include the break value in the total line of the break.

So I am in the market for a simple report generator. It must work with Oracle and produce plain-ascii output. Features include reprt headers and footers, page headers and footers, break headers and footers, all of which must be able to include values from the output. It has to be callable from a Unix shell script so it will fit with our enterprise system.

I’ve been looking for something suitable, and surprised I can’t find it. All of the solutions I have found are some combination of expensive, overly feature-full, or hard to use. I don’t want to use MS-Access with an ODBC connection. I looked at Jasper and BIRT; neither is suitable. I looked at every old Unix book in my library; I figured Don Libes would have something in his useful book, but he doesn’t. I’m looking for something at the level of a Unix shell script, something like Awk but with more of a report-writing flavor. On the next page you will see a sample report specification for a real report that I will need to write; SQL*Plus can handle most of that, but not the breakheader and breakfooter elements.

So I come to my readers and ask for help. Does anyone know of a simple 1960s-era Unix report generator?

Maybe I’ll write one.

Many thanks.

by programmingpraxis at May 11, 2018 09:00 AM

May 09, 2018

GNU Guix

Paper on reproducible bioinformatics pipelines with Guix

I’m happy to announce that the bioinformatics group at the Max Delbrück Center that I’m working with has released a preprint of a paper on reproducibility with the title Reproducible genomics analysis pipelines with GNU Guix.

We built a collection of bioinformatics pipelines called "PiGx" ("Pipelines in Genomix") and packaged them as first-class packages with GNU Guix. Then we looked at the degree to which the software achieves bit-reproducibility, analysed sources of non-determinism (e.g. time stamps), discussed experimental reproducibility at runtime (e.g. random number generators, the interface provided by the kernel and the GNU C library, etc) and commented on the practice of using “containers” (or application bundles) instead.

Reproducible builds is a crucial foundation for computational experiments. We hope that PiGx and the reproducibility analysis we presented in the paper can serve as a useful case study demonstrating the importance of a principled approach to computational reproducibility and the effectiveness of Guix in the pursuit of reproducible software management.

by Ricardo Wurmus at May 09, 2018 10:00 AM

May 08, 2018

Programming Praxis

An Array Exercise

We worked on linked lists in the previous exercise, so today we will work on arrays:

Given an array of distinct integers, replace each element of the array with its corresponding rank in the array. For instance, the input array [10,8,15,12,6,20,1] is replaced by the output array [4,3,6,5,2,7,1] because element 1 has rank 1, element 6 has rank 2, element 8 has rank 3, element 10 has rank 4, element 12 has rank 5, element 15 has rank 6, and element 20 has rank 7.

Your task is to write a program to replace array elements by their rank. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at May 08, 2018 09:00 AM

May 04, 2018

Programming Praxis

Linked List Exercises

I like to do exercises on linked lists. In languages like C, linked lists provide good drill for students who are uncertain about structures and pointers; in any language, lists provide good drill on recursion. Today’s exercise is about linked lists; we’ve seen some of these before, but it’s good to review:

  1. Take a list of integers and rearrange it so all the even integers appear before all the odd integers, with both evens and odds appearing in the output in the same order as the input.
  2. Take a list of integers, split it into two lists each containing alternate elements from the input list, then join the two lists back together.
  3. Take a list of integers and rearrange it so alternate nodes are each greater than their two adjacent nodes; in other words, the integers are in alternating high-low order.

Your task is to perform the three linked list exercises described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at May 04, 2018 09:00 AM

Greg Hendershott

Extramaze LLC: Using Racket, PostgreSQL, AWS (but no ads or JS)

For Extramaze LLC I’m using Racket in a commercial project — a search engine with email alerts for deals on music gear — deals.extramaze.com.

This blog post is a sort of whirlwind tour of the use case and business model, as well as how I use things like Racket, PostgreSQL, and AWS — but don’t use advertising or JavaScript.

Use case

Who needs this?

Let’s say I play guitar. In fact, I seem to collect guitars. My partner is skeptical of this use of financial and space resources. I have my eye on a certain 7-string guitar. I don’t need it. But I want it. If it goes on sale in the months ahead, I’d like to grab it. I can point out that I saved (say) $500. Although my partner will see right through this pathetic justification, they will appreciate that at least I am hearing them and making an effort. I hope.

Or more virtuously: I’m a parent and my child needs a trombone when school starts this autumn. I create an alert in case something goes on sale over the summer.

Business model

Currently you must join (create an account) to use the site. It’s free to search deals that are at least a day old. If you subscribe (pay), you can search the very newest deals — and save searches to get alerted when matching deals appear.

Why ask people to pay? I’m disenchanted with services that survive on money from advertising — where users are the product being sold, and the actual customers are advertisers.

Instead I want to do something where users = customers.

  • I like making things that people like enough to pay for.

  • As a practical matter, it can be hard enough to make one constituency happy. Two is double the difficulty — even more when interests conflict. A small company has enough challenges; why build more into the foundation.

If it turns out that not enough people want to pay for this service? I’d rather discontinue it than turn to advertising.

Picking a name

The last time I picked a company name, the name was “Cakewalk”1 and the year was 1987. Nearly a decade passed, the internet grew popular, and we registered cakewalk.com. Easy.

The experience now is… different. You may be surprised to learn that many desirable domain names are already registered. I know, right? After too many days agonizing with dictionaries and a thesaurus, I settled on a contraction of “extra” and “amaze” — Extramaze.

It turns out that “extramaze” is sometimes used to describe cues outside a lab maze. Although that’s a bit creepy, I’m relieved the cues are outside the maze — this alludes to beacons of hope shining on those of us stuck inside our own mazes! Or something. Moving on….

Trying to do the right thing: Privacy

The web site does not use third-party JavaScript. No Google Analytics. No social buttons. Nothing. The one exception — if you subscribe — is Stripe for payments.

If you don’t believe me look at the relevant parts of the Content-Security-Policy response header value:

default-src 'none'; script-src https://checkout.stripe.com;

(Indeed notice that script-src doesn’t include 'self' — the site itself supplies no JavaScript. If you ask “Why not?” I can only respond with another question: “Why?”. The UX doesn’t require it. Maybe someday. Meanwhile it is one less facet to develop, test, debug, and examine for vulnerabilities.)

Recently I experimented with Google and Twitter ads. Click-throughs were OK but conversion was poor. I wondered if account-creation was a speed bump, especially on mobile. So I added “Login with {Google, Twitter}” buttons. Of course if you use those, you are sharing some information with {Google, Twitter}. However you can still create and use a native Extramaze account.

So much for third-party data collection. How about first-party?

  • The web site does basic web server request and response logging.
  • When people create an account:
    • The service asks for:
      • an email address
      • a how-would-you-like-to-be-addressed name
      • a password
    • Successful and failed logins are stored — the latter to do an exponential back-off on subsequent attempts.
  • When people also pay to subscribe:
    • Stripe asks for credit card and zip code information. Extramaze does not see or store this.
    • They gain the ability to create saved searches, and get email alerts. Of course we need to store the search phrase, e.g. 7-string left hand guitar.

Beyond that, the service collects no personal information.

Trying to do the right thing: Security

I’ve spent a lot of time thinking through the privacy and security implications of account sign-up and password resets. This was the scariest part for me. I found Troy Hunt’s Everything you ever wanted to know about building a secure password reset feature incredibly helpful and follow its recommendations. Also useful: The definitive guide to form-based website authentication.

I already mentioned Content-Security-Policy. Information about this and other security headers is at securityheaders.com. Some headers let browsers report problems and https://report-uri.com is helpful here.

I won’t say, “We take security very seriously”, because that’s become a cliché. (Is there a web site that does for incident disclosures what OurIncredibleJourney.com does for acquihire notices?)

But. I think about security frequently. I read as much as I can about other people’s experiences and recommendations. I assume that vulnerabilities exist and the only question is who will find them; I need to exercise a commercially reasonable effort to find them first.

Ingredients

There are three main pieces:

  • Crawl certain music retailer web sites looking for deals (daily).
  • Send search alert emails (daily).
  • Run a web site (24/7).

All use some mix of:

  • Racket
  • PostgreSQL
  • AWS (EC2, ELB, ECS, SES, SNS, RDS, CloudWatch)

Racket: Crawling and scraping

Most music product web stores have a section — variously called “outlet”, “clearance”, “deal zone”, or “blowout” — where the deals are featured. Deals include demo units, price drops, and so on. For demo units the product might be “like new” condition, or there might be a cosmetic flaw which is described.

Currently we crawl these areas of Sweetwater and Zzounds.

More would be better. On the other hand, one step at a time. Furthermore, these days many web sites block crawlers by default. It doesn’t matter if you GET /robots.txt and follow its rules scrupulously. Instead you must ask a human for permission and get white-listed. When you ask, the human might say no, or simply not reply at all.

Racket has great “batteries included” for making HTTP requests, and parsing HTML responses into x-expressions. That is, HTML such as

1
2
3
4
5
6
<p class="class">
  Some
  <em>awesome</em>
  HTML
  &tm;
</p>

is equivalent to the x-expression

1
2
3
4
5
'(p ([class "class"]) 
   "Some " 
   (em () "awesome")
   " HTML"
   tm)

As the Lisp Evangelism Task Force will point out on Hacker News every few weeks, x-expressions are what XML would be if it weren’t produced by the Department of Redundancy Department.

Finding the “interesting bits” within an x-expression can be done in various ways. In simple cases, you can use Racket’s match, but for complicated HTML that can be tedious and/or brittle. se-path*/list is a better idea, but you can only select a direct path of element tags. You can’t express CSS selector things like, “Find all p elements somewhere under divs having a "container" class”.

So I wrote a function to do a depth-first folding walk of an x-expression. In addition to accumulating a result value, it accumulates a “path” — a list of full x-expressions from “this” one up through its ancestors to the root, full x-expression:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
(define path? (listof xexpr/c))

;; A depth-first folding walk of the xexpr. The "path" is a list of
;; xexpr from the current one to its ancestors. You can `match` on
;; this to do the equivalent of `se-path*/list`, but with the full
;; power of `match`.
(define (walk f v x)
  (let recur ([path '()]
              [v v]
              [x x])
    (define this-path (cons x path))
    (f (match x
         [(list* (? symbol? tag) (? list?) xs)
          (for/fold ([v v]) ([x xs])
            (recur this-path v x))]
         [_ v])
       this-path)))

I wrapped this in a simple select function that conjoins one or more predicates:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
;; Using quasiquoted match expressions with `walk` can be tedious and
;; error-prone. Often you end up specifying the match in more detail
;; than is really necessary. A sometimes friendlier way is to use
;; `select` with selector combinators grouped using `conjoin` (and
;; maybe `disjoin`).
(define (select x . fs) ;(-> xexpr/c (-> path? any) ...+ list?)
  (walk (λ (vs path)
          (match ((apply conjoin fs) path)
            [#f vs]
            [v (cons v vs)]))
        '()
        x))

I wrote a few functions intended to be used as combinators with conjoin, to do roughly CSS selector style matching. A few example pieces:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
(define (this path)
  (match path
    [(cons v _) v]
    [_          #f]))

(define (tag path)
  (match (this path)
    [(list (? symbol? v) _ ...) v]
    [_                          #f]))

(define ((tag? sym) path)
  (equal? (tag path) sym))

For example, to extract the value of the href attribute for all a.some-class elements whose immediate parent is a div.box-class:

1
2
3
4
5
6
(select xexpr
        (tag? 'a)
        (class? "some-class")
        (parent? (conjoin (tag? 'div)
                          (class? "some-box")))
        (attr-val 'href))

Of course you can layer on some shorthand compositions like tag.class?:

1
2
3
(define (tag.class? sym class)
  (conjoin (tag? sym)
           (class? class)))

And the example becomes:

1
2
3
4
(select xexpr
        (tag.class? 'a "some-class")
        (parent? (tag.class? 'div "some-box")
        (attr-val 'href))

You could also write a parser from CSS selector syntax to these select expressions — maybe even define a #lang css-selector. But I like s-expressions. Even if I didn’t, I’m not using this extensively enough to warrant that work.

Likewise, although I’d like to open-source this as a distinct, complete project, I just haven’t had time to review it thoroughly, write documentation, and so on.

Racket: web server

Racket has great “batteries included” for making web site servers. In addition to Racket’s documentation for this, I can recommend Jesse Alama’s resources including his blog lisp.sh and his book Server: Racket.

Racket provides a macro called dispatch-rules to define bi-directional “routes”. You give it rules, of each which is a URI pattern and a handler function. It defines two functions covering all of your rules:

  • A request? -> response? dispatcher.
  • A procedure? -> string? URI path maker.
1
2
3
4
5
6
7
(define-values (dispatch handler->path)
  (dispatch-rules
   [("")                  home]
   [("about")             about]
   [("path" "to" "foo")   path/to/foo]
   [("user" (string-arg)) user/id]
   [else                  not-found]))

Because I needed to do authorization, I wrapped this in my own macro, dispatch-rules+roles. This also defines a third function, request? -> roles?: Given a request that matches one of the patterns, what roles is a user required to have to be authorized to access it?

In the following example, we use three roles:

  • 'anon means an anonymous user.

  • 'free is a user who is authenticated (logged in); certain routes should only be available to them.

  • 'paid is an authenticated user who has also subscribed (paid) and therefore is authorized for even more routes.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
(define-values (dispatch handler->path request->roles)
  (dispatch-rules+roles
   ;; Routes requiring 'anon or 'free or 'paid roles
   [(anon free paid)
    [("")      home]
    [("about") about]
    [("join")  join]]
   ;; Routes requiring 'free or 'paid roles
   [(free paid)
    [(logout)      logout]
    [(preferences) preferences]
    [(subscribe)   subscribe]
   ;; Routes requiring 'paid role
   [(paid)
    [(payment)     payment]
    [(unsubscribe) unsubscribe]]
   [else not-found]))

My dispatch-rules+roles macro doesn’t itself do authorization — it defines the same old dispatch function that plain dispatch-rules does. You need to wrap that dispatch with something that consults request->roles and calls dispatch — or returns a 403 Forbidden response (for an API) or redirects to a login or subscribe page (web app).

Of course, that in turn should be wrapped in something that sets the current user (so we know their roles) from e.g. a session key — or returns a 401 response (for an API) or redirects to a login page (web app).

Speaking of multiple wrappers around dispatch, this is a nice way to compose functionality, which I’ve seen in the Clojure Ring community. It’s cleaner to have one wrapper per bit of functionality, as opposed to one handler with a monolithic hairball of conditionals.

So a handler? is a function from request? to response?, like dispatch. A wrapper? is a function that takes a handler?, and returns a new handler?.

1
2
(define handler? (-> request? response?))
(define wrapper? (-> handler? handler?))

For instance, a wrapper to enforce https:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
;; This assumes we're behind ELB or nginx which gets both http and
;; https, talks to us only via http, setting an X-Forwarded-Proto
;; header to say the original protocol and an X-Forwarded-For header
;; to say the original IP.

(define/contract ((wrap-http->https handler) req) wrapper?
  (match (headers-assq* #"x-forwarded-proto" (request-headers/raw req))
    [(header _ #"http")
     (redirect-to (path->external-uri
                   (url->string (struct-copy url (request-uri req)
                                             [scheme #f]
                                             [port   #f])))
                  permanently)]
    [_ (handler req)]))

A whole chain of such wrappers can be composed — using compose or the ~> threading macro — to wrap the original dispatch function when we start the Racket web server:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
(serve/servlet (~> ;Note: requests go UP this chain, responses DOWN
                dispatch
                wrap-gzip
                wrap-not-modified
                wrap-authorize
                wrap-authenticate
                wrap-http->https
                wrap-timed-and-logged)
               #:servlet-path      "/"
               #:servlet-regexp    #px""
               #:listen-ip         #f
               #:port              (current-internal-port)
               #:launch-browser?   (not (current-production))
               #:servlet-responder error-responder)

PostgreSQL

I was skeptical about using PostgreSQL: Is it web scale?

Seriously, I don’t have anything very exciting to describe about using PostgreSQL for this — which is wonderful. Currently:

  • I’m hosting at AWS RDS.

  • Initial capture from crawling goes into a simple star schema. The fact table has information about each deal, including its first-seen and last-seen times. When we already have a row for a deal, we just update the last-seen time. Dimension tables are what you’d expect — brand, product, and so on.

  • A simple view joins to denormalize, fitting the kind of query done by the web site’s search page. This is already quite fast; a materialized view makes it even faster.

  • Full-text search is delicious.

Toward the end of my time at Cakewalk, I got some experience with Microsoft SQL Server, including optimizations. As a result, I’m aware that I can likewise do much more with PostgreSQL — looking at query execution plans, tuning indexes and queries, and so on. For now I’m satisfied I know roughly what I can do if/as/when necessary.

Racket has an excellent db package. It also has a sql package that lets you write SQL as s-expressions rather than blobs of text, for example:

1
2
3
SELECT first, last
FROM tribbles
WHERE id = $1
1
2
3
(select first last
  #:from tribbles
  #:where (= id ,id))

AWS

I’m using Amazon Web Services because I feel badly that it is such an unpopular choice and want to see them get at least a little business.

Seriously, this has been a reasonable choice to get going quickly and it is affordable initially within the free tier.

One of my earliest Racket packages is aws. Although I’ve used it intermittently, lately I’m really eating my own dog food.

I made a local change to support getting AWS credentials from EC2 instance meta-data. After living with that in production for a couple months, I shared that back in commit 84c28ba.

Just a quick overview of other parts:

  • ELB: This can distribute load among multiple web servers. Even with just one server (to start) it is a convenient way to handle SSL.

  • SES: I’m only sending “transactional” emails (for account creation, password reset, and search alerts) so it’s been easy so far to maintain a good reputation.

  • Docker and ECS. Very helpful: Running Docker on AWS from the ground up.

  • CloudWatch Logs:

    • It is pretty easy to make a Racket log receiver that accumulates things into batches and does PutLogEvents.

    • It is handy to use JSON format for request/response logs. Not only does this display nicely, there’s a decent query feature, e.g. {$.response.duration > 100} or {$.request.headers.Host = "deals.extramaze.com"}.

Conclusion

I hope this helps give a taste of what it’s like to start a small SaaS business c. 2018 using Racket, PostgreSQL, and AWS — but without using advertising or JavaScript.

I realize this post has a somewhat uneven level of detail, so maybe I will loop back later and drill down on some parts.


  1. I’m simplifying for narrative flow. Day zero, the product name was Cakewalk. The company name was Twelve Tone Systems. Later we adopted Cakewalk as the company name, too. The point is, it was much easier to pick domain names in ye olden times. 

by Greg Hendershott at May 04, 2018 04:00 AM

May 01, 2018

Programming Praxis

Gauss’s Criterion

Yesterday was the 241st anniversary of the birth of Carl Gauss, who is perhaps the greatest mathematician who ever lived. In his honor, I picked a small task based on some mathematics that he discovered. Gauss’s Criterion states:

Let p be an odd prime and b a positive integer not divisible by p. Then for each positive integer 2 k − 1 < p, let rk be rk ≡ ( 2 k − 1) b (mod p) with 0 < rk < p, and let t be the number of even rk s. Then (b/p) = (-1)t, where (b/p) is the Legendre symbol.

Your task is to write a program to compute Gauss’s criterion, and confirm that it is the appropriate power of the Legendre symbol. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at May 01, 2018 09:00 AM

April 27, 2018

Ben Simon

Elise Four - Strong, Human Powered, Encryption

If you want to understand a subject, teach it to someone. If you want to truly and completely understand a subject, teach it to a computer. That is, program it.

This idea of coding to reach understanding is a lesson I've learned countless times. Most recently, I experienced this adage while tackling the Elise Four exercise from Programming Praxis. The task was to implement the EliseFour (aka, LC4) Encryption Algorithm. This approach to encryption is especially interesting because of its competing goals. First, it attempts to provide for strong encryption. Secondly, it's designed to be run offline by human power alone. (Not unlike this solution)

See the contradiction there? All the tools one thinks of when considering strong encryption, such as terrifically hard calculations and massively large numbers, are off the table. LC4 therefor has to choose clever conventions over raw processing power. The author solves this challenge by constructing a grid of Scrabble-like tiles and re-arranging them using a simple set of rules. While I'm interesting in implementing an offline version of the algorithm, for now, I've coded the algorithm in Scheme.

Here's the algorithm at work:

;; Example 1
Key:  xv7ydq #opaj_ 39rzut 8b45wc sgehmi knf26l 
Nonce:  solwbf 
Text:  im_about_to_put_the_hammer_down 
Encrypted:  i2zqpilr2yqgptltrzx2_9fzlmbo3y8_9pyssx8nf2 
Decrypted:  im_about_to_put_the_hammer_down#rubberduck 

;; Example 2
Key:  xv7ydq #opaj_ 39rzut 8b45wc sgehmi knf26l 
Nonce:  argvpx 
Text:  hurrrraaaaaaaaaaaaaaaaaaaaaay 
Encrypted:  3bcxnt57hus6accn97v4iie__hjbml8wr 
Decrypted:  hurrrraaaaaaaaaaaaaaaaaaaaaay#ben 

The source code for my solution can be found in the usual spot.

Example 1 comes directly from the paper describing the algorithm and was my lifeline for implementing and verifying the algorithm. You can see that to communicate with the LC4 algorithm, both users need to know the key value ahead of time. The key will always be an arrangement of the 36 different characters in the alphabet.

Encryption and decryption also depend on a nonce value, which is used to add additional noise into the system. The nonce value is to be shared with the recipient, and no harm comes from it being exposed to an attacker.

Example 2 emphasizes that even text with a telling shape comes out as random gibberish when encrypted; a sign that his is indeed a secure algorithm.

LC4 is an elegant solution to an unusual encryption challenge. On one level, I knew this by browsing the source paper on the topic. But now that I've struggled through an implementation, I know this on a completely different level. And when I say struggle, I mean it. Encryption algorithms are tricky to implement because the goal is to take text and turn it into garbage, but very specific garbage. So when I had variable c where y belonged, or neglected to reset r and y the algorithm still produced jumbled text, but it was the wrong jumbled text.

At least four times I was absolutely sure I had coded the encryption and decryption algorithm exactly as described, yet my code was failing. I did what any smart programmer does in these situations: I stepped away and re-attacked the problem after some rest. With some time away from the code, the bugs were obvious and easy to address.

When I finished coding the algorithm I had far more than just working code. I had an intimate appreciation for key generation, nonce usage, dictionary selection, authentication operation and a half dozen other subtle details that work together to make LC4 function. Oh, and the dopamine rush from seeing my text get encrypted and then decrypted was nice too.

by Ben Simon (noreply@blogger.com) at April 27, 2018 09:30 PM

Programming Praxis

Sum Square Digits Sequence

Regular readers of this blog know of my affinity for recreational mathematics, and today’s exercise is an example of that.

We looked at happy numbers in a previous exercise. Recently, Fermat’s Library re-published a proof by Arthur Porges, first published in the American Mathematical Monthly in 1945, that the trajectory of the sequence of summing the squares of the digits of a number always ends in 1 (a Happy Number) or a set of eight digits 4, 16, 37, 58, 89, 145, 42, 20 (a Sad Number). So, we look at this task again:

You are given a positive integer. Split the number into its base-ten digits, square each digit, and sum the squares. Repeat until you reach 1 (a Happy Number) or enter a loop (a Sad Number). Return the sequence thus generated.

For instance, 19 is a happy number, with sequence 19, 82, 68, 100, 1, while 18 is a sad number, with sequence 18, 65, 61, 37, 58, 89, 145, 42, 20, 4, 16, 37, …

Your task is to compute the sequence described above for a given n. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at April 27, 2018 09:00 AM

April 26, 2018

GNU Guix

Guix welcomes Outreachy, GSoC, and Guix-HPC interns

We are thrilled to announce that five people will join Guix as interns over the next few months! As part of Google’s Summer of Code (GSoC), under the umbrella of the GNU Project, three people are joining us:

  • Tatiana Sholokhova will work on a Web interface for the Guix continuous integration (CI) tool, Cuirass, similar in spirit to that of Hydra. Cuirass was started as part of GSoC 2016.
  • uniq10 will take over the build daemon rewrite in Scheme, a project started as part of last year's GSoC by reepca. The existing code lives in the guile-daemon branch. Results from last year already got us a long way towards a drop-in replacement of the current C++ code base.
  • Ioannis P. Koutsidis will work on implementing semantics similar to that of systemd unit files in the Shepherd, the “init system” (PID 1) used on GuixSD.

Through Outreachy, the inclusion program for groups underrepresented in free software and tech, one person will join:

Finally, we are welcoming one intern as part of the Guix-HPC effort:

  • Pierre-Antoine Rouby arrived a couple of weeks ago at Inria for a four-month internship on improving the user experience of Guix in high-performance computing (HPC) and reproducible scientific workflows. Pierre-Antoine has already contributed a couple of HPC package definitions and will next look at tools such as hpcguix-web, guix pack, and more.

Gábor Boskovits, Ricardo Wurmus, and Ludovic Courtès will be their primary mentors, and the whole Guix crowd will undoubtedly help and provide guidance as it has always done. Welcome to all of you!

About GNU Guix

GNU Guix is a transactional package manager for the GNU system. The Guix System Distribution or GuixSD is an advanced distribution of the GNU system that relies on GNU Guix and respects the user's freedom.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. Guix uses low-level mechanisms from the Nix package manager, except that packages are defined as native Guile modules, using extensions to the Scheme language. GuixSD offers a declarative approach to operating system configuration management, and is highly customizable and hackable.

GuixSD can be used on an i686, x86_64 and armv7 machines. It is also possible to use Guix on top of an already installed GNU/Linux system, including on mips64el and aarch64.

by Ludovic Courtès at April 26, 2018 03:00 PM

April 24, 2018

Ben Simon

Lightweight, R7RS Compliant and in palm of my hand - Compiling chibi-scheme on Android

I'm eyeing this Programming Praxis exercise as one to tackle next but there's a catch before I can start programming. TinyScheme, my current Scheme Interpreter of choice, doesn't have support for generating random numbers, and the exercise requires this.

I could hack something together using TinyScheme (and probably the Programming Praxis Standard Prelude). Or, I could do as John Cowen suggested, and try out a different implementation. He suggested Chibi-Scheme. Chibi Scheme is lightweight, but unlike TinyScheme, it's also quite modern. This means, among other things, it should give me easy access to the SRFI-27: Sources of Random Bits library. Longer term, it would be ideal to have access to R7RS features and other modern niceties.

I busted out my cell phone and hardware keyboard. I opened up Termux and went to work. Here's my attempt at getting chibi-scheme running on Android:

Let's be clever and build this on the SD card.

[localhost]{~}(1)$ cd storage/external-1/src

[localhost]{src}(1)$ git clone https://github.com/ashinn/chibi-scheme.git
Cloning into 'chibi-scheme'...
remote: Counting objects: 18101, done.
remote: Compressing objects:  ....
Receiving objects: 100% 
Resolving deltas:   0% 

[localhost]{src}(1)$ cd chibi-scheme

[localhost]{chibi-scheme}(1)$ ./configure
bash: ./configure: /bin/sh: bad interpreter: Permission denied

D'oh. OK, don't panic.

[localhost]{chibi-scheme}(1)$ sh configure
Autoconf is an evil piece bloatware encouraging cargo-cult programming.
Make, on the other hand, is a beautiful little prolog for the filesystem.
Just run 'make'.

Oooh, sassy software. I like it.

[localhost]{chibi-scheme}(1)$ make
The program 'make' is not installed. Install it by executing:
 pkg install make

No problem, I'll just do what Termux tells me to do...

[localhost]{chibi-scheme}(1)$ pkg install make
Building dependency tree...
Reading state information...
All packages are up to date.
The following packages were automatically installed and are no longer required:
  binutils libandroid-support-dev libffi libllvm ndk-stl ndk-sysroot
Use 'apt autoremove' to remove them.
The following NEW packages will be installed:
  make
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 76.5 kB of archives.
After this operation, 246 kB of additional disk space will be used.
Get:1 https://termux.net stable/main aarch64 make aarch64 4.2.1 [76.5 kB]
Preparing to unpack .../make_4.2.1_aarch64.deb ...



[localhost]{chibi-scheme}(1)$ make
echo '#define sexp_so_extension "'.so'"' > include/chibi/install.h
echo '#define sexp_default_module_path "'/data/data/com.termux/files/usr/share/chibi:/data/data/com.termux/files/usr/lib/chibi'"' >> include/chibi/install.h
echo '#define sexp_platform "'android'"' >> include/chibi/install.h
echo '#define sexp_version "'0.8.0'"' >> include/chibi/install.h
echo '#define sexp_release_name "'`cat RELEASE`'"' >> include/chibi/install.h
cc -c  -Iinclude  -Wall -g -g3 -O3  -o main.o main.c
make: cc: Command not found
make: *** [Makefile:149: main.o] Error 127

Whoops. No compiler. After attempts to install 'gcc' and search for a compiler
package in general I finally tripped over 'clang.' That will apparently install
a C compiler.

[localhost]{chibi-scheme}(1)$ pkg install clang
The following NEW packages will be installed:
  clang
0 upgraded, 1 newly installed, 0 to remove and 0 not upgraded.
Need to get 12.9 MB of archives.
After this operation, 61.2 MB of additional disk space will be used.
Get:1 https://termux.net stable/main aarch64 clang aarch64 6.0.0-1 [12.9 MB]
Preparing to unpack .../clang_6.0.0-1_aarch64.deb ...

[localhost]{chibi-scheme}(1)$ make
cc -c -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3  -o main.o main.c
cc -c -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3  -fPIC -o gc.o gc.c


It's working! It's really working!

cc -c -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3  -fPIC -o sexp.o sexp.c
cc -c -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3  -fPIC -o bignum.o bignum.c
cc -c -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3  -fPIC -o gc_heap.o gc_heap.c
cc -c -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3  -fPIC -o opcodes.o opcodes.c
cc -c -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3  -fPIC -o vm.o vm.c
cc -c -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3  -fPIC -o eval.o eval.c
cc -c -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3  -fPIC -o simplify.o simplify.c
cc -fPIC -shared -Wl,-soname,libchibi-scheme.so.0 -o libchibi-scheme.so.0.8.0 gc.o sexp.o bignum.o gc_heap.o opcodes.o vm.o eval.o simplify.o    -ldl -lm
ln -sf libchibi-scheme.so.0.8.0 libchibi-scheme.so.0
ln: libchibi-scheme.so.0: Operation not permitted
make: *** [Makefile:162: libchibi-scheme.so.0] Error 1

And it broke. Ugh.  After a much debugging I realized that my issue
was permission based. Apparently Android doesn't allow you to perform
certain operations on your SD card. I moved the source tree to internal 
storage and tried again.

[localhost]{chibi-scheme}(1)$ cd ..
[localhost]{src}(1)$ mv chibi-scheme ~/

[localhost]{src}(1)$ cd

[localhost]{~}(1)$ cd chibi-scheme
[localhost]{chibi-scheme}(1)$ make
ln -sf libchibi-scheme.so.0.8.0 libchibi-scheme.so.0
ln -sf libchibi-scheme.so.0 libchibi-scheme.so

It's working again. Whooo!

cc -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3   -o chibi-scheme main.o -L. -lchibi-scheme
LD_LIBRARY_PATH=".:/data/data/com.termux/files/usr/lib" DYLD_LIBRARY_PATH=".:" CHIBI_MODULE_PATH=lib ./chibi-scheme -q tools/chibi-ffi lib/chibi/filesystem.stub
cc -fPIC -shared -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3   -o lib/chibi/filesystem.so lib/chibi/filesystem.c -L.  -lchibi-scheme
cc -fPIC -shared -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3   -o lib/chibi/weak.so lib/chibi/weak.c -L.  -lchibi-scheme
cc -fPIC -shared -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3   -o lib/chibi/heap-stats.so lib/chibi/heap-stats.c -L.  -lchibi-scheme
cc -fPIC -shared -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3   -o lib/chibi/disasm.so lib/chibi/disasm.c -L.  -lchibi-scheme
cc -fPIC -shared -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3   -o lib/chibi/ast.so lib/chibi/ast.c  -L. -lchibi-scheme
LD_LIBRARY_PATH=".:/data/data/com.termux/files/usr/lib" DYLD_LIBRARY_PATH=".:" CHIBI_MODULE_PATH=lib ./chibi-scheme -q tools/chibi-ffi lib/chibi/emscripten.stub
cc -fPIC -shared -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3   -o lib/chibi/emscripten.so lib/chibi/emscripten.c -L.  -lchibi-scheme
LD_LIBRARY_PATH=".:/data/data/com.termux/files/usr/lib" DYLD_LIBRARY_PATH=".:" CHIBI_MODULE_PATH=lib ./chibi-scheme -q tools/chibi-ffi lib/chibi/process.stub
cc -fPIC -shared -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3   -o lib/chibi/process.so lib/chibi/process.c -L.  -lchibi-scheme
LD_LIBRARY_PATH=".:/data/data/com.termux/files/usr/lib" DYLD_LIBRARY_PATH=".:" CHIBI_MODULE_PATH=lib ./chibi-scheme -q tools/chibi-ffi lib/chibi/time.stub
cc -fPIC -shared -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3   -o lib/chibi/time.so lib/chibi/time.c -L.  -lchibi-scheme
LD_LIBRARY_PATH=".:/data/data/com.termux/files/usr/lib" DYLD_LIBRARY_PATH=".:" CHIBI_MODULE_PATH=lib ./chibi-scheme -q tools/chibi-ffi lib/chibi/system.stub
cc -fPIC -shared -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3   -o lib/chibi/system.so lib/chibi/system.c -L.  -lchibi-scheme
LD_LIBRARY_PATH=".:/data/data/com.termux/files/usr/lib" DYLD_LIBRARY_PATH=".:" CHIBI_MODULE_PATH=lib ./chibi-scheme -q tools/chibi-ffi lib/chibi/stty.stub
cc -fPIC -shared -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3   -o lib/chibi/stty.so lib/chibi/stty.c -L.  -lchibi-scheme
LD_LIBRARY_PATH=".:/data/data/com.termux/files/usr/lib" DYLD_LIBRARY_PATH=".:" CHIBI_MODULE_PATH=lib ./chibi-scheme -q tools/chibi-ffi lib/chibi/net.stub
cc -fPIC -shared -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3   -o lib/chibi/net.so lib/chibi/net.c -L.  -lchibi-scheme
LD_LIBRARY_PATH=".:/data/data/com.termux/files/usr/lib" DYLD_LIBRARY_PATH=".:" CHIBI_MODULE_PATH=lib ./chibi-scheme -q tools/chibi-ffi lib/chibi/io/io.stub
cc -fPIC -shared -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3   -o lib/chibi/io/io.so lib/chibi/io/io.c -L.  -lchibi-scheme
cc -fPIC -shared -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3   -o lib/chibi/optimize/rest.so lib/chibi/optimize/rest.c -L.  -lchibi-scheme
cc -fPIC -shared -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3   -o lib/chibi/optimize/profile.so lib/chibi/optimize/profile.c -L.  -lchibi-scheme
LD_LIBRARY_PATH=".:/data/data/com.termux/files/usr/lib" DYLD_LIBRARY_PATH=".:" CHIBI_MODULE_PATH=lib ./chibi-scheme -q tools/chibi-ffi lib/chibi/crypto/crypto.stub
cc -fPIC -shared -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3   -o lib/chibi/crypto/crypto.so lib/chibi/crypto/crypto.c -L.  -lchibi-scheme
cc -fPIC -shared -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3   -o lib/srfi/27/rand.so lib/srfi/27/rand.c -L.  -lchibi-scheme
cc -fPIC -shared -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3   -o lib/srfi/151/bit.so lib/srfi/151/bit.c -L.  -lchibi-scheme
cc -fPIC -shared -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3   -o lib/srfi/39/param.so lib/srfi/39/param.c -L.  -lchibi-scheme
cc -fPIC -shared -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3   -o lib/srfi/69/hash.so lib/srfi/69/hash.c -L.  -lchibi-scheme
cc -fPIC -shared -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3   -o lib/srfi/95/qsort.so lib/srfi/95/qsort.c -L.  -lchibi-scheme
cc -fPIC -shared -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3   -o lib/srfi/98/env.so lib/srfi/98/env.c -L.  -lchibi-scheme
LD_LIBRARY_PATH=".:/data/data/com.termux/files/usr/lib" DYLD_LIBRARY_PATH=".:" CHIBI_MODULE_PATH=lib ./chibi-scheme -q tools/chibi-ffi lib/srfi/144/math.stub
cc -fPIC -shared -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3   -o lib/srfi/144/math.so lib/srfi/144/math.c -L.  -lchibi-scheme
cc -fPIC -shared -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3   -o lib/scheme/time.so lib/scheme/time.c -L.  -lchibi-scheme
cc -fPIC -shared -DSEXP_USE_INTTYPES -Iinclude  -Wall -g -g3 -O3   -o lib/srfi/18/threads.so lib/srfi/18/threads.c -L.  -lchibi-scheme
echo "# pkg-config" > chibi-scheme.pc
echo "prefix=/data/data/com.termux/files/usr" >> chibi-scheme.pc
echo "exec_prefix=\${prefix}" >> chibi-scheme.pc
echo "libdir=/data/data/com.termux/files/usr/lib" >> chibi-scheme.pc
echo "includedir=\${prefix}/include" >> chibi-scheme.pc
echo "version=0.8.0" >> chibi-scheme.pc
echo "" >> chibi-scheme.pc
cat chibi-scheme.pc.in >> chibi-scheme.pc
find lib/chibi/ -name \*.sld | \
 LD_LIBRARY_PATH=".:/data/data/com.termux/files/usr/lib" DYLD_LIBRARY_PATH=".:" CHIBI_MODULE_PATH=lib ./chibi-scheme tools/generate-install-meta.scm 0.8.0 > lib/.chibi.meta
find lib/srfi/ -name \*.sld | \
 LD_LIBRARY_PATH=".:/data/data/com.termux/files/usr/lib" DYLD_LIBRARY_PATH=".:" CHIBI_MODULE_PATH=lib ./chibi-scheme tools/generate-install-meta.scm 0.8.0 > lib/.srfi.meta
find lib/scheme/ -name \*.sld | \
 LD_LIBRARY_PATH=".:/data/data/com.termux/files/usr/lib" DYLD_LIBRARY_PATH=".:" CHIBI_MODULE_PATH=lib ./chibi-scheme tools/generate-install-meta.scm 0.8.0 > lib/.scheme.meta

Amazing - it built without issue. How about installing, will that work?

[localhost]{chibi-scheme}(1)$ make install
install -d /data/data/com.termux/files/usr/bin
install -m0755 chibi-scheme /data/data/com.termux/files/usr/bin/
install -m0755 tools/chibi-ffi /data/data/com.termux/files/usr/bin/
install -m0755 tools/chibi-doc /data/data/com.termux/files/usr/bin/
install -m0755 tools/snow-chibi /data/data/com.termux/files/usr/bin/
install -m0755 tools/snow-chibi.scm /data/data/com.termux/files/usr/bin/

many boring 'install' lines removed. You're welcome.

if type ldconfig >/dev/null 2>/dev/null; then ldconfig; fi
echo "Generating images"
Generating images
cd / && LD_LIBRARY_PATH="/data/data/com.termux/files/usr/lib:/data/data/com.termux/files/usr/lib" DYLD_LIBRARY_PATH="/data/data/com.termux/files/usr/lib:" /data/data/com.termux/files/usr/bin/chibi-scheme -d /data/data/com.termux/files/usr/share/chibi/chibi.img
cd / && LD_LIBRARY_PATH="/data/data/com.termux/files/usr/lib:/data/data/com.termux/files/usr/lib" DYLD_LIBRARY_PATH="/data/data/com.termux/files/usr/lib:" /data/data/com.termux/files/usr/bin/chibi-scheme -xscheme.red -d /data/data/com.termux/files/usr/share/chibi/red.img
cd / && LD_LIBRARY_PATH="/data/data/com.termux/files/usr/lib:/data/data/com.termux/files/usr/lib" DYLD_LIBRARY_PATH="/data/data/com.termux/files/usr/lib:" /data/data/com.termux/files/usr/bin/chibi-scheme -mchibi.snow.commands -mchibi.snow.interface -mchibi.snow.package -mchibi.snow.utils -d /data/data/com.termux/files/usr/share/chibi/snow.img

Holy smokes, it's done. Let's see if it really worked.

[localhost]{chibi-scheme}(1)$ cd

[localhost]{~}(1)$ chibi-scheme
> (+ 1 2 3)
6

We have scheme! Let's try something a bit more advanced

> (import (srfi 27) (srfi 1))
> (random-integer 100)
56
> 

The R7RS library functionality is working and 
srfi's are installed by default. We have success!

And here are some screenshots to show this really was all executed on my Galaxy S9 Plus:

by Ben Simon (noreply@blogger.com) at April 24, 2018 06:16 PM