Planet Scheme

September 18, 2018

Programming Praxis

1-800-PPRAXIS

I hate businesses with telephone numbers that spell words; it’s awkward to type letters at a telephone, and often the words are abbreviated or spelled wrong so they are no longer mnemonic, which defeats the purpose. So this morning when I had to call one of those businesses, I wrote a program to translate letters to numbers after I had been waiting on hold long enough to get really annoyed.

Your task is to write a program to translate telephone numbers specified as words. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at September 18, 2018 09:00 AM

September 11, 2018

Peter Bex

An appeal to the WHATWG

As you may know, I co-maintain the uri-generic egg, together with Ivan Raikov. We had just been working on fixing a bug and porting it to CHICKEN 5 when I stumbled across the WHATWG URL specification, an evolution over RFC 3986. I found it hard to believe they dropped the formal grammar from the RFC, so I checked the issue queue and found a closed ticket from 2015.

They replaced the BNF with a series of steps which is several pages long and overly concerned with implementation-specific details.

It really got to me that such an important and basic part of the web stack is so informally specified. So I wrote an appeal to them to restore a formal grammar in this ticket. I think the reasons are worth being spread more widely, so I'm reproducing it here on my blog.

My request

I would like to offer my opinion from an implementor's perspective and hopefully convince the WG to restore a formal grammar. Let me start by providing some background on where I'm coming from. Feel free to skip this next section.

My background

I am the co-maintainer of the uri-generic egg for CHICKEN Scheme. This implementation attempts to follow RFC 3986 to the letter, and this has resulted in what IMO is a very high-quality implementation (at least, as far as parsing is concerned; URL construction still has some known issues). Oftentimes when we ran into issues, we've compared it with other implementations. It turns out that many of these are lacking in some way or another. I think the main reason is that they're not attempting to really implement the formal grammar (even if they claim to be RFC compliant), while we do. We even have a growing repository of alternative implementations using different parser generators which all pass the same test suite! (feel free to now call me a smug Lisp/Scheme weenie :) )

I wasn't aware of the WHATWG spec until I saw it mentioned in a libcurl post. It piqued my interest because I'm always looking for more test cases. The web platform test suite looks like a big, juicy set to start using in our egg's tests. I'd also consider implementing the WHATWG spec if this increases compatibility with other implementations.

What I expect from a spec

As an implementor, I routinely check the RFC's ABNF as a guide to determine what a valid URL should look like. If someone finds a certain URL our implementation doesn't parse, or if it parses an URL that it shouldn't, the first thing I do is go back to the ABNF in the RFC to verify the behaviour. It is compact, to the point and, for a trained eye, it is trivial to quickly determine if a parser should accept a given (sub)string or not.

The collected ABNF of RFC 3986 is a brief three screenful. In contrast, the algorithm in the WHATWG spec is roughly eighteen screenful. It is an overly detailed and nonstandard way of defining a grammar. This makes it harder to determine which language is accepted by this algorithm. It also makes it hard for me to determine what the changes are, compared to the RFC. Implementing the WHATWG spec would (for me) involve a complete rewrite.

The specification is so focused on the mechanics of a specific manual parsing technique that it almost precludes parser generators or other implementations. Parser generators have a long tradition in theory and practice, and can generate **efficient** language recognisers. Even today, it is an active research field; PEG grammars for example have been "discovered" as recently as 2004.

The way I think about it is that the purpose of this spec is to define what a URL "officially" looks like. So, as an implementor, I don't understand the hesitation to supply a formal grammar. Not having one will likely result in different people interpreting the spec differently. This results in _less_ interoperability, which defeats the point of a spec.

Other reasons why I think a formal grammar is important

Finally, I would like to emphasise the importance of parsers based on formal grammars over ad hoc ones for security reasons. Let's say you have a pipeline of multiple processors which use different URL parsers. For example, you might have a HTML parser on a comment form which cleans URLs by dropping JavaScript and data URLs, among other things, or a mail client which blocks intranet or file system-local URLs before invoking an HTML viewer. If these are all ad hoc "informal" parsers that try to "fix" syntactically invalid URLs, it is nigh-impossible to verify that filtering them for "safe" URLs is correct. That's because it's impossible to decide which language is really accepted by an ad hoc implementation. An implementation further down the stack might interpret an URL (radically) different from one up the stack and you have a nice little exploit in the making.

If you're not convinced by my measly attempts at explaining this idea, please watch the talk "The Science of Insecurity". Meredith Patterson states the case much more eloquently than I ever could. This talk was an absolute eye-opener for me.

With this context, it baffled me to read the statement that "there are several large parts of the spec that cannot be captured by any kind of grammar". This is literally equivalent to saying "we can't know if an URL will be valid without evaluating the algorithm". This means you cheerfully drag the halting problem into what should be a simple, straightforward notation (come on, URLs aren't **that** ill-defined!). As far as I can tell, the RFC defines a regular grammar. The decision to go from a regular to an unrestricted grammar should not be taken lightly!

by Peter Bex at September 11, 2018 08:09 PM

Programming Praxis

Nth From The End

Counting from the end of the list, the third-last item in the list (1 2 3 4 5 6 7 8 9) is 7.

Your task is to write a program to find the nth-last item in a list; you must provide at least three fundamentally different solutions. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at September 11, 2018 09:00 AM

September 07, 2018

Programming Praxis

List Homework

For many students, school started a few weeks ago, so today’s exercise is typical of homework:

  1. Write a program to determine the length of a linked list.
  2. Write a program to reverse a linked list.

Your task is to write the two list exercises described above; write them as if you are three weeks into your first data structures class. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at September 07, 2018 11:24 AM

September 04, 2018

Programming Praxis

Mind-Boggling Card Trick

Today’s exercise is a mind-boggling card trick:

Create a pack of 52 cards, half red, half black, and shuffle them. With the cards face down, turn over the top card. If it is black, add the next card, unseen, to the black stack; if it is red, add the next card, unseen, to the red stack. Add the original top card to the discard stack. Repeat these steps for all the cards in the pack. Now, randomly choose a number less than the size of the smaller of the black and red stacks, choose that many cards randomly from each of the two stacks, and swap those randomly-chosen cards from one stack to the other. The number of black cards in the black stack will equal the number of red cards in the red stack.

Your taks is to write a program to simulate the card trick and confirm that the number of black cards in the black stack equals the number of red cards in the red stack. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at September 04, 2018 09:00 AM

August 31, 2018

Programming Praxis

Sum Of Perfect Powers

I think this must be somebody’s homework:

Write a program to determine if a number x can be written as the sum of two perfect powers. This is, given x determine if there exist non-negative integers a, b, m and n such that am + bn = x where 1 ≤ x ≤ 106 and m > 1 and n > 1.

Your task is to write a program that determines if a number is a sum of two perfect powers. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at August 31, 2018 09:00 AM

August 28, 2018

Programming Praxis

Array Of Integers

Today’s exercise is an interview question from Amazon:

You are given an array of integers, not necessarily unique. Your goal is to transform the array of integers so that the smallest number does not differ from the largest number by more than two times by removing integers from the array; thus, if x is the smallest element in the array, and y is the largest, then y ≤ 2x. You need only find the number of items to be removed from the array, not make a list of the items. As an example, given the array [4,5,3,8,3,7], you can remove the 2 smallest integers, leaving [4,5,7,8], so that 8 ≤ 2 × 4, or you can remove the 2 largest integers, leaving [3,4,5,6], so that 5 ≤ 2 × 3.

Your task is to write a function to determine how many items to remove from the array. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at August 28, 2018 09:00 AM

August 24, 2018

Programming Praxis

Data Structures Homework

Today’s exercise is from a programming student taking a Data Structures course, using Java.

Given a text file, print a list of the ten most common words in the file, along with their frequency of occurrence. You may not use a HashMap or ArrayList, but you may use regular expressions.

Your task is to write a program to find the ten most common words in an input file. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at August 24, 2018 09:00 AM

August 21, 2018

Programming Praxis

Mars Rover

Today’s exercise is a Microsoft interview question:

Write code using the commands below for two rovers to meet. Two rovers are dropped on Mars. Imagine Mars to be a straight infinite plane. When the rovers are dropped on Mars they are dropped with a parachute, so their initial position on Mars is on the parachute:

The only commands available are:

  1. Go left.
  2. Go right.
  3. No operation.
  4. If on parachute go to label.

A label points to a piece of code with a name where it is possible to transfer execution.

Using ONLY the commands above, write code to enable the rovers to meet.

Your task is to write a program to help the Mars rovers meet. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at August 21, 2018 09:00 AM

August 17, 2018

Programming Praxis

Matrix Search

Today’s exercise is a simple task in matrix manipulation. Given a matrix in which the rows are ordered but the columns or not, efficiently search for a specific item in the matrix. For instance, given the matrix

2 5 8
1 4 7
3 6 9

 

a search for 6 finds it in row 2 column 1, a search for 8 finds it in row 0 column 2, and a search for 0 fails.

Your task is to write a program that searches a matrix. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at August 17, 2018 09:00 AM

August 14, 2018

Programming Praxis

A Simple Interview Question

Today’s interview question comes from Apple:

Write a program to add two integers. You may not use the + (addition) operator, but may use the ++ (increment by 1) or -- (decrement by 1) operators. Be sure your solution works for both positive and negative inputs.

Your task is to write a program to add two integers. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at August 14, 2018 01:00 PM

August 13, 2018

GNU Guix

GSoC 2018 report: Cuirass Web interface

For the last three months I have been working with the Guix team as a Google Summer of Code intern. The title of my project is "GNU Guix (Cuirass): Adding a web interface similar to the Hydra web interface".

Cuirass is a continuous integration system which monitors the Guix git repository, schedules builds of Guix packages, and presents the build status of all Guix packages. Before my project, Cuirass did not have a web interface. The goal of the project was to implement an interface for Cuirass which would allow a user to view the overall build progress, details about evaluations, build failures, etc. The web interface of Hydra is a good example of such a tool.

In this post, I present a final report on the project. The Cuirass repository with the changes made during the project is located at http://git.savannah.gnu.org/cgit/guix/guix-cuirass.git. A working instance of the implemented interface is available at https://berlin.guixsd.org/. You can find more examples and demonstrations of the achieved results below.

About Cuirass

Cuirass is designed to monitor a git repository containing Guix package definitions and build binaries from these package definitions. The state of planned builds is stored in a SQLite database. The key concepts of the Cuirass internal state are:

  • Job specification. Specifications state what has actually to be done by Cuirass. A specification is defined by a Scheme data structure (an association list) which includes a job name, repository URL, as well as the branch and a procedure proc that specifies how this is to be built.

  • Evaluation. An evaluation is a high-level build actcion related to a certain revision of a repository of a given specification. For each specification, Cuirass continuously produces new evaluations which build different versions of the project represented by revisions of the corresponding repository. Derivations and builds (see below) each belong to a specific evaluation.

  • Derivation. Derivations represent low-level build actions. They store such information as name of a build script and its arguments, input and output of a build action, target system type, and necessary environment variables.

  • Build. A build is a result of build actions that are prescribed by a derivation. This could be a failed build or a directory containing the files that were generated by compiling a package.

Besides the core which executes build actions and records their results in the database, Cuirass includes a web server which previously only responded to a handful of API requests with JSON containing information about the current status of builds.

Web interface

The Cuirass web interface implemented during the project is served by the Cuirass web server whose functionality has been extended to generating HTML responses and serving static files. General features of the interface are listed below.

  • The backend is written in Guile and implements request processing procedures which parse request parameters and extract specific data to be displayed from the database.

  • The frontend consists of HTML templates represented with Guile SXML and the Bootstrap 4 CSS library.

  • The appearance is minimalistic. Every page includes only specific content information and basic navigation tools.

  • The interface is lightweight and widely accessible. It does not use JavaScript which makes it available to users who do not want to have JavaScript running in the browser.

Structure

Let's review the structure of the interface and take a look at the information you can find in it. Note that the web-interface screenshots presented below were obtained with synthetic data loaded into Cuirass database.

Main page

The main page is accessible on the root request endpoint (/). The main page displays a list of all the specifications stored in the Cuirass database. Each entry of the list is a clickable link which leads to a page about the evaluations of the corresponding specification (see below).

Here is an example view of the main page.

Main page screenshot

Evaluations list

The evaluations list of a given specification with name <name> is located at /jobset/<name>/. On this page, you can see a list of evaluations of the given project starting from the most recent ones. You can navigate to older evaluations using the pagination buttons at the bottom of the page. In the table, you can find the following information:

  • The ID of the evaluation which is clickable and leads to a page with information about all builds of the evaluation (see below).

  • List of commits corresponding to the evaluation.

  • Build summary of the evaluation: number of succeeded (green), failed (red), and scheduled (grey) builds of this evaluation. You can open the list of builds with a certain status by clicking on one of these three links.

Here is a possible view of the evaluations list page:

Screenshoot of evaluations list

Builds list

The builds list of a evaluation with ID <id> is located at /eval/<id>/. On this page, you can see a list of builds of the given evaluation ordered by their stop time starting from the most recent one. Similarly to the evaluation list, there are pagination buttons located at the bottom of the page. For each build in the list, there is information about the build status (succeeded, failed, or scheduled), stop time, nixname (name of the derivation), system, and also a link to the corresponding build log. As said above, it is possible to filter builds with a certain status by clicking on the status link in the evaluations list.

Screenshot of builds list

Summary

Cuirass now has the web interface which makes it possible for users to get an overview on the status of Guix package builds in a user-friendly way. As the result of my GSoC internship, the core of the web interface was developed. Now there are several possibilities for future improvements and I would like to welcome everyone to contribute.

It was a pleasure for me to work with the Guix team. I would like to thank you all for this great experience! Special thanks to my GSoC mentors: Ricardo Wurmus, Ludovic Courtès, and Gábor Boskovits, and also to Clément Lassieur and Danny Milosavljevic for their guidance and help throughout the project.

by Tatiana Sholokhova at August 13, 2018 03:00 PM

August 11, 2018

Peter Bex

What to expect from CHICKEN 5

We're getting close to a CHICKEN 5 release, so let's take a look at the cool new stuff!

Overhaul of built-in modules

The biggest change you'll notice when you fire up CHICKEN and start to use it is that the modules that come shipped with core are completely different from CHICKEN 4. The functionality is mostly the same, but we moved things around (a lot!) to make things more logical.

This is also the main reason we decided to bump the major version number: the modules have different names, procedures have been renamed, merged or dropped.

You can take a look at the complete list in the CHICKEN 5 manual. We've taken the module layout from R7RS small as inspiration, but since CHICKEN is still an R5RS Scheme first (with r7rs being an optional extension) we had to make some changes.

So, we define a scheme module which contains the entire R5RS language. For everything that is a CHICKEN-specific extension to standard R5RS Scheme, we put it under a (chicken ...) name, which tries to follow the R7RS naming conventions.

For example, R7RS defines a (scheme process-context) module with the following procedures:

  • command-line
  • exit
  • emergency-exit
  • get-environment-variable
  • get-environment-variables

Likewise, CHICKEN defines a (chicken process-context) module, which is a superset of the corresponding R7RS module. Take a look at its manual page; you can see that it defines many more procedures, but it includes all the standard ones too.

By using the R7RS names but with scheme replaced by chicken, the new modules should be easy to remember for anyone used to R7RS. Of course, you can still write portable standard R7RS programs via the r7rs egg, which defines a 100% compatible (scheme process-context) module with only the R7RS identifiers.

There is one important caveat: Because our scheme modules exports everything from R5RS Scheme, we don't provide, say, a (chicken cxr) module for all the cadadr, caddar and so on, because those are all in scheme. This also means that the (chicken load) module does not export load; that's already in scheme. Instead, it defines various non-standard CHICKEN extensions like load-relative and such.

Saner module imports

Speaking of modules, we've improved the way modules are linked into user code. In CHICKEN 4, there's a very strict distinction between modules and (compilation) units. This was an endless source of confusion for beginners. For example, why did (import foo) give an error when you tried to actually refer to an identifier from the foo module? That's because import didn't actually load the code, just the import library. To actually load the code and import the library, you needed (use foo). You could also load the code without importing it via (require-library foo). This should help with cross-compilation. The idea was that you would only need to load the import library on the host, and have the library itself compiled for the target, but in practice you needed to compile the library twice anyway (once on the host, once for the target).

We got rid of this mess: now the canonical way to import the foo library is simply (import foo). For more info, see this post by Felix outlining how to improve imports.

Full numeric tower

Of course, support for the full numeric tower is a personal favorite of mine, having spent a lot of time to perfect this stuff!

Most importantly, this means you no longer need to worry about integer computations over- or underflowing into a flonum and all the weird floating-point problems that entails. Bignums are also a necessity when dealing with 64-bit numeric C types in the FFI. For example, we finally support the size_t type correctly. To me, complex numbers and exact fractions (aka rational numbers) are a nice added bonus, as you could already get them before with the numbers egg. However, by having these types built-in, they're more efficient and you don't have to worry about passing these numbers to code that can't handle them because support happened not to be compiled in.

Take some time to read my blog series about the numeric tower if you're interested in the details.

Declarative egg description language

The chicken-install program to install eggs was rewritten along with all the surrounding tools. The main reason to do this was to make the life of package maintainers easier.

The old version of chicken-install would download, build, install and (optionally) run the unit tests as part of one command. If any dependencies were missing, it would also recursively download, build, install and run tests for those as well. The new version cleanly separates these steps, by generating shell scripts (batch files on Windows) that can do the necessary actions to build and install.

To make this easier, we also had to re-think the egg "language". In CHICKEN 4, a .setup-file was simply a Scheme program in which a few helper procedures were available for calling the compiler. This means it's impossible to create a simple shell script that will separate the build and install steps. That's why we now have a separate, declarative file which describes the components of an egg. See the .egg file documentation for a concrete example.

The rewritten chicken-install will now also cache eggs to avoid re-downloading the same eggs again and again. By default the cache is stored in a dot-directory under the user's home directory. This can be overridden with the CHICKEN_EGG_CACHE environment variable, which might also help package maintainers take the distributed files from another location.

See these design notes for more information about the goals and motivations behind the rewrite.

Improved support for static compilation

In principle, CHICKEN 4 has good support for static compilation. In practice, egg authors would not include the necessary commands for building their libraries statically. Most people don't have a real need for static linking, which means they tend not to make an effort to support it just in case someone else might need it.

The upshot of this was that you could only really compile programs statically when they didn't use any eggs, or if you created a custom build script that would compile the eggs manually with the required -static option. With the new chicken-install, you get static compilation support automatically, for free.

Note that in CHICKEN 4, you could also build eggs and programs using the so-called deployment mode. This allowed shipping a program with all its libraries in one directory. This worked quite well if your target platform supported it, but not all platforms did. Static compilation covers all the use cases that deployment supported and works reliably on all platforms, so we decided to drop deployment mode with all the complexity it brings.

Other noteworthy things

But wait, there's more!

  • Code generation is now fully deterministic, making builds reproducible. This allows you to verify that any given file of generated C code corresponds to the Scheme source code by recompiling it with the same CHICKEN version, both for user code and for CHICKEN core itself. As an added bonus, because the generated C output is deterministic, ccache can be used to get much faster builds (before, it would invalidate the cache as each file would be different).
  • We've improved how symbols are garbage collected, which was optional and somewhat broken in CHICKEN 4. This will speed up code that generates many symbols, and stops symbol table stuffing attacks from being a threat.
  • We have removed quite a bit of bloat: The srfi-1, srfi-13, srfi-14, srfi-69 and srfi-18 libraries have been removed from core! Not to worry though; they are now available as eggs. This will both allow faster development and encourage innovation and competition from alternatives to these non-essential libraries (especially R7RS-large seems to be geared towards renewal of some of these). We've also moved several non-SRFI procedures from core: object-evict, compile-file, binary-search, procedures for dealing with queues, scan-input-lines and POSIX group-information have all been moved to eggs. Support for SWIG has been removed, as it was bit-rotting and nobody seemed to be using it anyway.
  • Ports can now be bi-directional, so there's no more unnecessary distinction between input-ports and output ports. This maps more cleanly to file descriptor semantics, which can also be opened for both reading and writing.
  • Random number generation has been completely replaced. Before, we used libc's rand(), which produces very low quality random numbers. CHICKEN 5 uses the WELL512 PRNG to generate random integers, and it provides access to the system entropy pool for generating cryptographically secure streams of random bytes (using /dev/urandom on *nix, and RtlGenRandom on Windows).

Conclusion

There's a lot to like about the new CHICKEN, so go ahead and give it a spin! Release candidate 1 was made available today for you to try. The full list of changes can of course be found in the NEWS file. If you're already a happy CHICKEN 4 user, we've created a porting guide for you, to make it easier to make the transition from 4 to 5. If you need more help, you can of course contact the always friendly CHICKEN community.

by Peter Bex at August 11, 2018 09:44 AM

August 10, 2018

Programming Praxis

Penniless Pilgrim

Today’s exercise comes to us courtesy of long-time reader and contributor Ben Simon; you might like to look at his version of the task:

You must travel from the top-left corner of a five-by-five grid of points to the bottom-right corner, stepping from one point to another without retracing any edge. The first two steps must be eastward through the grid. Each step has a cost. An eastward step adds 2 to the current accumulated cost (so you start with an accumulated cost of 4), a westward step subtracts 2 from the current accumulated cost, a northward step halves the current accumulated cost, and a southward step doubles the current accumulated cost. It is possible to reach the bottom-right corner with an accumulated cost of zero.

Ben’s version of the task has a pretty map of Duonia, and a story about a penniless pilgrim who longs to visit the temple.

Your task is to write a program to find the zero-cost path through the grid. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at August 10, 2018 09:00 AM

August 03, 2018

Programming Praxis

Odometer

Today’s exercise is an Amazon interview question:

An odometer variously uses letters and digits in its reading; for instance, an odometer might have a reading of G7TS39. Incrementing the odometer means adding 1 to a digit or increasing to the next letter to the next letter, with a carry to the next column when appropriate. Thus, G7TS39 increments to G7TS40 and G7TZ99 increments to G7UA00. As a special case, the odometer never increases in size, but rolls over, so the next odometer reading after Z9ZZ99 is A0AA00. An odometer position that initially contains a digit always contains a digit, and an odometer position that initially contains a letter always contains a letter, so 1Z9 increments to 2A0.

Your task is to write a program to increment an odometer. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at August 03, 2018 09:00 AM