Planet Scheme

November 28, 2014

Programming Praxis

A Prime Number Puzzle

Today’s exercise comes from the world of recreational mathematics.

For each number n from 1 to 9 inclusive, find a number m that begins with the digit n, has n digits, has each two-digit sequence within m a different prime number, and is as small as possible.

When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.


by programmingpraxis at November 28, 2014 09:00 AM

November 27, 2014

Andy Wingo

scheme workshop 2014

I just got back from the US, and after sleeping for 14 hours straight I'm in a position to type about stuff again. So welcome back to the solipsism, France and internet! It is good to see you on a properly-sized monitor again.

I had the enormously pleasurable and flattering experience of being invited to keynote this year's Scheme Workshop last week in DC. Thanks to John Clements, Jason Hemann, and the rest of the committee for making it a lovely experience.

My talk was on what Scheme can learn from JavaScript, informed by my work in JS implementations over the past few years; you can download the slides as a PDF. I managed to record audio, so here goes nothing:


55 minutes, vorbis or mp3

It helps to follow along with the slides. Some day I'll augment my slide-rendering stuff to synchronize a sequence of SVGs with audio, but not today :)

The invitation to speak meant a lot to me, for complicated reasons. See, Scheme was born out of academic research labs, and to a large extent that's been its spiritual core for the last 40 years. My way to the temple was as a money-changer, though. While working as a teacher in northern Namibia in the early 2000s, fleeing my degree in nuclear engineering, trying to figure out some future life for myself, for some reason I was recording all of my expenses in Gnucash. Like, all of them, petty cash and all. 50 cents for a fat-cake, that kind of thing.

I got to thinking "you know, I bet I don't spend any money on Tuesdays." See, there was nothing really to spend money on in the village besides fat cakes and boiled eggs, and I didn't go into town to buy things except on weekends or later in the week. So I thought that it would be neat to represent that as a chart. Gnucash didn't have such a chart but I knew that they were implemented in Guile, as part of this wave of Scheme consciousness that swept the GNU project in the nineties, and that I should in theory be able to write it myself.

Problem was, I also didn't have internet in the village, at least then, and I didn't know Scheme and I didn't really know Gnucash. I think what I ended up doing was just monkey-typing out something that looked like the rest of the code, getting terrible errors but hey, it eventually worked. I submitted the code, many years ago now, some of the worst code you'll read today, but they did end up incorporating it into Gnucash and to my knowledge that report is still there.

I got more into programming, but still through the back door, so to speak. I had done some free software work before going to Namibia, on GStreamer, and wanted to build a programmable modular synthesizer with it. I read about Supercollider, and decided I wanted to do something like that but with the "unit generators" defined in GStreamer and orchestrated with Scheme. If I knew then that Scheme could be fast, I probably would have started on an entirely different course of things, but that did at least result in gainful employment doing unrelated GStreamer things, if not a synthesizer.

Scheme became my dominant language for writing programs. It was fun, and the need to re-implement a bunch of things wasn't a barrier at all -- rather a fun challenge. After a while, though, speed was becoming a problem. It became apparent that the only way to speed up Guile would be to replace its AST interpreter with a compiler. Thing is, I didn't know how to write one! Fortunately there was previous work by Keisuke Nishida, jetsam from the nineties wave of Scheme consciousness. I read and read that code, mechanically monkey-typed it into compilation, and slowly reworked it into Guile itself. In the end, around 2009, Guile was faster and I found myself its co-maintainer to boot.

Scheme has been a back door for me for work, too. I randomly met Kwindla Hultman-Kramer in Namibia, and we found Scheme to be a common interest. Some four or five years later I ended up working for him with the great folks at Oblong. As my interest in compilers grew, and it grew as I learned more about Scheme, I wanted something closer there, and that's what I've been doing in Igalia for the last few years. My first contact there was a former Common Lisp person, and since then many contacts I've had in the JS implementation world have been former Schemers.

So it was a delight when the invitation came to speak (keynote, no less!) the Scheme Workshop, behind the altar instead of in the foyer.

I think it's clear by now that Scheme as a language and a community isn't moving as fast now as it was in 2000 or even 2005. That's good because it reflects a certain maturity, and makes the lore of the tribe easier to digest, but bad in that people tend to ossify and focus on past achievements rather than future possibility. Ehud Lamm quoted Nietzche earlier today on Twitter:

By searching out origins, one becomes a crab. The historian looks backward; eventually he also believes backward.

So it is with Scheme and Schemers, to an extent. I hope my talk at the conference inspires some young Schemer to make an adaptively optimized Scheme, or to solve the self-hosted adaptive optimization problem. Anyway, as users I think we should end the era of contorting our code to please compilers. Of course some discretion in this area is always necessary but there's little excuse for actively bad code.

Happy hacking with Scheme, and au revoir!

by Andy Wingo at November 27, 2014 05:48 PM

November 25, 2014

Programming Praxis

Thou Impertinent Urchin-Faced Miscreant!

In 1968, Newsweek published an article “How to Win at Wordsmanship” by Philip Broughton. You pick a random number from 000 to 999, then look up a three word phrase in a table coded by the three digits of the random number:

Column 1 Column 2 Column 3
0. integrated 0. management 0. options
1. total 1. organizational 1. flexibility
2. systematized 2. monitored 2. capability
3. parallel 3. reciprocal 3. mobility
4. functional 4. digital 4. programming
5. responsive 5. logistical 5. concept
6. optional 6. transitional 6. time-phase
7. synchronized 7. incremental 7. projection
8. compatible 8. third-generation 8. hardware
9. balanced 9. policy 9. contingency

For instance, the random number 031 leads to the phrase “integrated reciprocal flexibility,” which you can use in a technical conversation or report. Broughton said “No one will have the remotest idea of what you are talking about, but the important thing is that they’re not about to admit it.”

Since then, many buzz-phrase generators have been developed, including corporate-speak “holistically embrace customer-directed imperatives” and the Shakespearean insult generator that gave us the title of our exercise. We have a couple of word lists on the next page, or you can Google for “buzz-phrase generator” to find lots of them on the web, but it’s most fun to build your own.

Your task is to write a program that generates random buzz-phrases. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.


by programmingpraxis at November 25, 2014 09:00 AM

November 23, 2014

The Racket Blog

800!

800
(Racket Tasks On Rosetta Code)

Since (and even before) Asumu Takikawa’s post “200!” at the beginning of March 2013, folk have been beavering away, implementing tasks on Rosetta Code. And on November 15th 2014:

800 tasks have now been Implemented1 in Racket on the Rosetta Code website!

Before I go any further it must be said that, without a doubt… this is awesome! This achievement represents a lot of work, and a lot of code. And everyone who has participated should be thanked and congratulated for getting this far.

So thank you. And well done!

What is Rosetta Code?

Rosetta Code (RC) describes itself as:

.. “a programming chrestomathy site. The idea is to present solutions to the same task in as many different languages as possible, to demonstrate how languages are similar and different, and to aid a person with a grounding in one approach to a problem in learning another. Rosetta Code currently has 758 tasks, 134 draft tasks, and is aware of 560 languages, though we do not (and cannot) have solutions to every task in every language.”2

Of these tasks, 800 have been implemented in Racket… some tasks, like Hello World/Text, have been implemented in hundreds of languages. Some, however, like Time-based One-time Password Algorithm, have only been implemented in 3 (including Racket).

If you haven’t already, I suggest you take a quick look about the site to get a feel of what that means in practice.

WARNING: Rosetta Code is a wiki. Like any wiki it will steal your time from you as you browse tasks, algorithms, languages and the occasional link to Wikipedia. Don’t say I didn’t warn you.

What Can You Do With Rosetta Code?

Learn From It

Rosetta Code is a valuable resource with plenty of material to absorb and ideas to be had from. If you’re new to Racket, there are tasks like Loops/For which will get you on your way with fundamental programming tasks.

If you want something juicier there are other tasks (like Nonogram solver which runs to over 400 lines of Racket) for you to pick over.

And there’s everything in between.

Write Code!

Each task gives you a chance to think, “Is this how I would do this?”

Even if I don’t submit something, I find it’s fun to write some code around the task. In fact, I don’t even have to type code into a REPL, the thought exercise is often fun enough!

Some tasks, like (“Chess Player”), are shall we say, very challenging. But don’t let even that put you off thinking of, tinkering around or coding a solution to them.

If there isn’t a Racket implementation for a task you like the looks of, have a go.

Someone might have a better idea of how to do it in Racket later. But if there isn’t an implementation now — change that now!

Remember that others will be reading your code to understand Racket all the better. So please try to adhere to the Racket Style Guide as best you can3. Again, don’t worry about getting that perfect. Like anything, learning Racket style takes practice, and nobody expects perfection. And the experienced contributors/documentors are always at hand to help correct style.

Hints:

  • I always have a Wikimedia Cheatsheet to hand. I can never remember its markdown syntax (which unfortunately doesn’t really support <code/>, either)
  • Pick whatever task you want… but it would be good to clear all the “Complete Tasks” (as opposed to “Draft Tasks”) if you have a choice.
  • Don’t add code until it’s running and producing the output you expect. This isn’t Project Euler, you don’t have to guess the answer in most cases — it’s likely someone has some sample output to compare to.4
  • If you can’t get your head around the algorithm in the task description then try to translate another language into Racket. You’ll learn Racket, you’ll learn the other language, and in working it through for yourself you’ll also see how the algorithm takes shape and works.
  • My workflow for posting a new solution is this: Once I have something to submit, I add a “Stub” Racket implementation. I edit the whole Task page (because that is all that is available to edit at that point), find whatever is after Racket alphabetically, and insert boilerplate code. I check this template code as a “minor edit” described as “Racket stub added — implementation later”. This then allows is for me to be able to edit the Racket section in isolation and keeps the rest of the task (everyone else’s hard work) safe from any, er, silliness.
{{header|Racket}}
<lang racket>
</lang racket>
{{out}}
<pre>
</pre>
  • I have started to make it my habit (especially when showcasing one or two functions) to use #lang racket/base and requireing the salient individual functions.
  • It is often not appropriate to fully document the functionality of Racket functions in the RC task implementation. You can, however, point to the canonical documentation on the Racket website. So I also include a link to http://docs.racket-lang.org/reference/... when I need to.
  • The RC administrators have switched off image uploading (or I, at least, cannot find out how). Even though Racket can produce images as results, think hard about whether you want the hassle of trying to present images to the reader. If you find out a method that works for you please tell me, I’d love to know.

I suppose you could also extend all of the above to coding in another language — if you really have to.

Improve What’s Already There

Rosetta Code is a wiki.

It is open to anyone to edit.

Don’t be afraid to.

If you see something that could be implemented, styled or documented better — work to improve it.

Once you have your improved entry together, show it to the author of the original post. Besides being courteous, he or she might have an opinion on what else you might do. Often, there is something bugging them, and you are scratching an itch of theirs!

I have never had anyone react badly to a change request. Everyone appreciates that you have made an effort to produce your change (and that you’re not just standing in the aisles complaining that it doesn’t look right).

Teach Through It

If there is an aspect of Racket, algorithm or other “CS task” (in the broadest sense) that you want to share: see to it that it is adequately illustrated on Rosetta Code. If it is not, then create a task to demonstrate it. Not only will you show how something is done properly (i.e. in Racket), but you will also be inviting others to implement the task in their own favourite languages.

The Competition5

Back at 200, Racket was the 54th most popular language. But for some time now, it has been sitting at #2 in the popularity1 ranking for quite some time now. For a while, it has been placed a long way behind TCL, and being hotly pursued by Python (never more than 10 tasks behind).

One of my motivators is that having seen Racket get to #2 — I don’t want to see it any lower in the rankings. I’m sure there’s something in the Python lot that wants to overtake us! This healthy competition has kept both of the communities pushing ahead with implementing the tasks.

The Intro Projects page of the racket wiki has “Implement a Rosetta code task” as a “Small Project”. I think of it as slightly more of a “Recreational” project (this at least justifies to myself the element of competition that has crept in.)

The Rallying Call

or “What Specifically Would Help Racket on Rosetta Code?”

RC is a good way to present Racket as a most general programming language. So as a tool for Racket advocacy, as well as for the purposes of RC, we need to:

  • Implement more tasks in Racket to keep a high profile: 800 tasks, #2 in the popularity stakes. This keeps Racket visible; and proves it capable of (almost) anything. I would so love to give TCL a run for its money — so there’s 41 tasks to go before we can even think of taking a breather!
    • We have implemented 800 tasks in Racket. The quote above says there are 892 (758+134) tasks in total. That means that there are 92 more tasks to get to grips with.

  • Suggest new tasks: Especially tasks that will demonstrate the latest shiny feature of the latest shiny versions of Racket!

    Personally I can’t believe that there are less than 900 things that you would want to do with a programming language. If you think of a task, add it. Even impossible tasks provoke thought and imagination — and interesting solutions!

  • Improve those tasks that have been implemented in Racket: We want to maintain a body of good, useful code, to allow us to teach and demonstrate Racket. There are a number of reasons why existent tasks need revisiting:
    • Racket technology has moved on (and moves on) apace. What was unavailable and experimental even 18 months ago is now available and reliable. This new technology needs to be demonstrated.
    • Code that is even older is very “schemey” (I have in some cases simply copied the Scheme implementation and stuck a #lang racket tag on the front). Although compatible, Racket has moved quite a way on from Scheme.
    • Some implementors (not a million miles from where I’m standing, for example), were not as au fait with the language and/or style guide as they might be now. It’s a housekeeping job, I know, but giving the examples as consistent a style as possible will help satisfy this aspiration from the Racket Style Guide:

    “Doing so will help us … and our users, who use the open source code … as an implicit guide to Racket programming.”

  • Document tasks: see my hint about documentation and links above for what I now think is good practice. If some code seems utterly heiroglyphic, see if it can be made clearer. Remember this is Racket, not APL.
  • General tidying up never goes amiss.
  • RC is run by someone outside the Racket community. At the bottom of the “Small Projects” section of the Racket wiki is a suggestion to collect the RC examples into something “owned” by the Racket community. I’ve been thinking about this… if anyone has suggestions, let me know. There are limits to what we can put on RC (defined by the purpose of RC itself). It would be good to remove those limits by implementing something along RC’s lines oursleves.
  • Very specifically… anyone with a joystick, drivers and some spare time - please could you do “Joystick Position”. The possession of a joystick puts you in a position of great power with respect to that task. Exercise your responsibility.

And Finally…

Well done everyone again! Keep up the good work. And see you at 1000!



  1. You can track Racket (and everyone else’s) progress on the Popular Programming Languages report, which is updated hourly or so. 


  2. Rosetta Code’s Front Page 


  3. The style guide is actually the chapter called “How to Program Racket” in the main Racket documentation. One of the RC “style” rules is that code should be 80 characters wide. Personally, I ignore that in favour of Racket’s more generous 102. Sometimes someone on RC objects. Sometimes I then care enough to put the required newlines in. 


  4. Even if there are example results don’t necessarily trust them. e.g. in The ISAAC Cipher, the cypher engine isn’t reset between test runs in the Pascal implementation. That error is propagated through all other implementations. Mine (Racket) conforms to show that I’m doing the same thing as everyone else; but I also do what I think to be a more correct test later. 


  5. Hold on a mo… this is meant to be a pedagogical exercise, not a competition 

by Tim Brown (noreply@blogger.com) at November 23, 2014 03:40 PM

November 21, 2014

Programming Praxis

An Array Of Zeroes

Beware! Today’s exercise, which derives from an interview question asked at Facebook, is trickier than it looks:

You are given an array of integers. Write a program that moves all non-zero integers to the left end of the array, and all zeroes to the right end of the array. Your program should operate in place. The order of the non-zero integers doesn’t matter. As an example, given the input array [1,0,2,0,0,3,4], your program should permute the array to [1,4,2,3,0,0,0] or something similar, and return the value 4.

Your task is to write the indicated program. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.


by programmingpraxis at November 21, 2014 09:00 AM

November 18, 2014

Jeremy Kun

Learning a single-variable polynomial, or the power of adaptive queries

Problem: Alice chooses a secret polynomial with nonnegative integer coefficients. Bob wants to discover this polynomial by querying Alice for the value of for some integer of Bob’s choice. What is the minimal number of queries Bob needs to determine  exactly? Solution: Two queries. The first is , and if we call , then the second query […]

by j2kun at November 18, 2014 03:00 PM

Ben Simon

Cramming Darwin into My Cell Phone

My father, a Professor of Biology, is fond of telling his students (and children): Darwin is always in the room. The idea, as he's explained to me, is that the basic elements of evolution (competition, mutation, adaptation, selection, etc.) are present in all biological processes, no matter how big or small. So if Darwin can be in the room, why not in my computer?

I give you the latest exercise from Programming Praxis: Dawkin's Weasel. This exercise requires that you implement the basic constructs of evolution in code:

I don’t know who it was first pointed out that, given enough time, a monkey bashing away at random on a typewriter could produce all the works of Shakespeare. The operative phrase is, of course, given enough time. Let us limit the task facing our monkey somewhat. Suppose that he has to produce, not the complete works of Shakespeare but just the short sentence ‘Methinks it is like a weasel’, and we shall make it relatively easy by giving him a typewriter with a restricted keyboard, one with just the 26 (capital) letters, and a space bar. How long will he take to write this one little sentence?
... We again use our computer monkey, but with a crucial difference in its program. It again begins by choosing a random sequence of 28 letters, just as before … it duplicates it repeatedly, but with a certain chance of random error – ‘mutation’ – in the copying. The computer examines the mutant nonsense phrases, the ‘progeny’ of the original phrase, and chooses the one which, however slightly, most resembles the target phrase, METHINKS IT IS LIKE A WEASEL.

Writing the prescribed algorithm was easy enough. You can find the complete source code here and the highlights below. Writing the code efficiently, on the other hand, turned out to be more elusive. My version creeps along, taking about 25 minutes to produce an answer. Why the sluggishness? I could blame it on the fact that I'm running it on an interpreter on my phone, but really, it's slow because I haven't take the time to make it fast. Still, after exactly 333 successive generations the system went from pure randomness to the target phrase (on run #2, it only took 211 generations to produce the right match). Slow and steady, that seems appropriate for the topic anyway.

As a programmer, I can't help but marvel at how simple and effective this algorithm is. By providing two key elements: (a) a way to introduce change and (b) a way to score the results, I'm able to write code that finds a solution even though I have no idea what the ideal path to producing that solution is. It's no surprise that there's a whole area of programming dedicated to this approach.

While I think this exercise nicely demonstrates some key aspects of evolution, my favorite experiment on the topic is still this one. In this case, the power of mutation (a key ingredient for evolution) is shown using the act of tracing a single vertical line. Spoiler alert: all those tiny errors introduce when a person tries and fails to copy the line exactly turn into significant changes. The results are pretty staggering. If you want to skip all the chit chat, and just see the effect in action, check out this video.

And here's the code for our computer monkeys:

; http://programmingpraxis.com/2014/11/14/dawkins-weasel/
;
; requires sort from:
;  https://raw.githubusercontent.com/benjisimon/code/master/programming-praxis/dawkins-weasle.scm

(define goal (string->list "METHINKS IT IS LIKE A WEASEL"))

(define (range low high)
 (if (> low high) '()
     (cons low (range (+ 1 low) high))))

(define (rand-char)
 (let ((offset (random-integer 27)))
  (if (= offset 26) #\space
      (integer->char (+ 65 offset)))))
      
(define (mutate input)
 (map (lambda (x)
       (let ((rand (random-integer 100)))
        (if (< rand 5) (rand-char) x)))
       input))
  
(define (score input)
 (let loop ((value 0) (input input) (goal goal))
  (cond ((null? input) value)
        (else
         (loop (if (equal? (car input) (car goal))
                   (+ 1 value) 0)
               (cdr input) (cdr goal))))))
               
(define (bang)
 (map (lambda (c) (rand-char)) goal))
 

(define (tick input)
 (let ((attempts
         (sort (map (lambda (i) (mutate input)) (range 1 100))
               (lambda (x y)
                 (> (score x) (score y))))))
   (car attempts)))
 
(define (solve)
 (let loop ((generation 0) (input (bang)))
  (cond ((equal? goal input) generation)
        (else
         (display (list generation (score input))) (newline)
         (loop (+ 1 generation)
               (tick input))))))

by Ben Simon (noreply@blogger.com) at November 18, 2014 11:34 AM

Programming Praxis

Damm’s Algorithm

We studied Hans Peter Luhn’s algorithm for generating check digits in a previous exercise. Today, we look at an alternate check digit algorithm developed by H. Michael Damm. Both algorithms are useful for creating checked identification numbers, suitable for credit cards or other identity numbers.

Damm’s algorithm is based on table lookup. It is initialized with a current check digit of 0. Then, each digit of the number, from left to right (most-significant to least-significant) is looked up in a two-dimensional table with the current check digit pointing to the row and the digit of the number pointing to the column, using this table:

  0 1 2 3 4 5 6 7 8 9
0 0 3 1 7 5 9 8 6 4 2
1 7 0 9 2 1 5 4 8 6 3
2 4 2 0 6 8 7 1 3 5 9
3 1 7 5 0 9 8 3 4 2 6
4 6 1 2 3 0 4 5 9 7 8
5 3 6 7 4 2 0 9 5 8 1
6 5 8 6 9 7 2 0 1 3 4
7 8 9 4 5 3 6 2 0 1 7
8 9 4 3 8 6 1 7 2 0 9
9 2 5 8 1 4 3 6 7 9 0

For instance, given the input 572, the check digit is 4 and the output is 5724. The check digit is computed starting with 0, then row 0 and column 5 gives a new check digit of 9, row 9 and column 7 gives a new check digit of 7, and row 7 and column 2 gives a final check digit of 4. Notice that leading zeroes do not affect the check digit.

Checking goes the same way, and is successful if the final check digit is 0. Given the input 5724, the initial check digit is 0, then row 0 and column 5 gives a new check digit of 9, row 9 and column 7 gives a new check digit of 7, row 7 and column 2 gives a new check digit of 4, and row 4 and column 4 gives a final check digit of 0.

Your task is to write functions that add a check digit to a number and validate a number to which a check digit has been added. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.


by programmingpraxis at November 18, 2014 09:00 AM

GNU Guix

GNU Guix 0.8 released

We are pleased to announce the next alpha release of GNU Guix, version 0.8.

The release comes both with a source tarball, which allows you to install it on top a running GNU/Linux system, and a USB installation image to install the standalone operating system.

The highlights for this release include:

See the original announcement for details.

About GNU Guix

GNU Guix is the functional package manager for the GNU system, and a distribution thereof.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. It also offers a declarative approach to operating system configuration management. Guix uses low-level mechanisms from the Nix package manager, with Guile Scheme programming interfaces.

At this stage the distribution can be used on an i686 or x86_64 machine. It is also possible to use Guix on top of an already installed GNU/Linux system, including on mips64el.

by Ludovic Courtès at November 18, 2014 08:14 AM

November 17, 2014

Greg Hendershott

GitHub dropped Pygments

My first-ever open source contribution, a couple years ago, was to a project called Pygments. My motivation? GitHub was displaying Racket source code poorly. Pygments didn’t have a Racket lexer. GitHub was using a Scheme lexer for Racket code. The Scheme lexer was highlighting square brackets in red as an “error”. This was really distracting and ugly.

I contributed a new Racket lexer to Pygments, and waited for that to roll into a Pygments release and in turn be deployed on GitHub. Finally Racket code looked good! Later Dave Corbett substantially improved the Racket lexer beyond my small start.

A few days ago, I was confused to see that Racket code was displaying poorly again on GitHub. The square brackets were highlighted in red as errors — again??

Cartoon-me’s thought balloons: WAT, OMFG, FML, &c. Why are we going in circles?


Someone submitted an issue against pygments.rb, which is GitHub’s Ruby library to use the Python Pygments library. I commented on that issue, while it began to dawn on me that GitHub wasn’t using Pygments anymore. Which led to me submitting this issue against linguist.

The GitHub folks responded quickly, especially during the weekend. They proposed a reasonable way to at least improve the status quo — use an existing TextMate lexer.


In issue 1717’s comments, it didn’t seem appropriate for me to editorialize beyond the issue itself. But I can editorialize here on my blog.

I’m sure GitHub had what they felt were compelling reasons for abandoning Pygments. My guesses:

  • pygments.rb had an interesting history of trying to use a Python library in Ruby on a high-traffic web site. They had tried various approaches. The final approach — piping to a long-running Python process — seemed to work well. But maybe not. Or maybe the history was too much. Sometimes code gets a bad reputation, initially deserved, later not, but it can never shake the rep.

  • There’s probably some rationale related to “synergy” with their Atom editor. Something like, “GitHub needs lexers, Atom needs lexers, why have two systems?” (Hopefully this does not include the idea that Atom will benefit from more/improved lexers in the course of people fixing this GitHub “regression”. That, in my opinion, would be a little too clever-evil.)

  • TextMate lexers are available for the top 20 languages. Maybe the thinking was, sure there’s a long tail, but it will just have to sort itself out somehow.

Again, these guesses — speculation based on my experience with software projects, but speculation nonetheless. I don’t have any insider information. Hopefully at some point GitHub will share their thinking. So far they haven’t, which leads to the next point.


Regardless of the reasons, the change was deployed without advance notice, as far as I can tell. The approach seemed to be, if things break, people will report it and we’ll fix it. In other words, we’ll fix on-demand what people care about, rather than proactively worry about things maybe no ones cares about.

That is actually a very valid and reasonable approach in some situations — particularly older, “legacy” applications with a huge surface area and user attrition. Although that’s not what GitHub seems like to me on the outside, maybe that’s how it feels to them internally, I don’t know.

But in any case, here’s the problem. Although people use GitHub to collaborate1, many also use it as a portfolio. It is part of how people hope to get a job, or persuade developers to try a new language, or whatever. To suddenly doink the appearance of people’s portfolios is unfortunate.


In dropping Pygments, I’m sure GitHub realized they’d lose dozens of lexers that many people have worked to contribute over the years. That negative must have been outweighed, in their opinion, by positives.

I’m not sure they realized that they’re also losing a well-documented, relatively easy process for contributing lexers. The Pygments project makes it clear and simple for people who want to add or improve a lexer. Documentation and help are available. Lexers are written in simple Python, which is arguably easier than wrangling plist XML or even JSON.2 There are probably a hundred example lexers to learn from. And finally, there is a process to review and vet the lexers, before they reach GitHub production. Well, past-tense: “reached”.

Effectively GitHub has opted to take all this upon themselves, now. If they do it as well as Pygments, great, but it’s a lot of work for them and a lot of redundancy. On the other hand if they don’t do it as well, it’s bad for all the not-top-twenty, not-hip-this-year languages.

Anyway, that’s my first reaction. Maybe this decision makes sense in a way that I just don’t understand yet.

  1. In which case you could argue that, outside of pull requests, people mostly view the project code locally, so the code formatting doesn’t matter quite so much. 

  2. The Racket lexer eventually grew to nearly 1,000 lines of Pyton. Granted the bulk is lists of keyword and built-ins. Even so, it seems that converting it to a plist or cson representation is going to multiply the line count by 3, 5, maybe 10. If such a conversion were straightforward, perhaps GitHub would have implemented that? 

by Greg Hendershott at November 17, 2014 05:00 PM

November 14, 2014

Greg Hendershott

Racket workflow

If you’re coming to Racket from another REPL language (such as another Lisp), this post might be real Captain Obvious material.

But if you’re coming to Racket from an edit/compile/debug language like C or C++, it might be unclear what a typical workflow is. You might have questions like:

  • How do I compile?
  • How do I debug?

After seeing a question like this recently1, I figured I’d write about my typical workflow in DrRacket or Emacs. It is:

  1. Write a small function, in a source file.
  2. Try calling the function, in the REPL.
  3. Write some more examples of calling it, in the source file.
  4. If any problems, fix them and go to 2.
  5. Change the examples into rackunit tests.

Let’s walk through these steps with a simple example.

Write a small function

Here’s file.rkt:

1
2
3
4
#lang racket

(define (twice n)
  (* n 2))

Try calling it in the REPL

What is “the REPL”? REPL stands for Read Eval Print Loop. You have a prompt, and can enter Racket expressions to be evaluated.

  • DrRacket calls this the “interactions” pane, on the right or bottom.

  • Emacs racket-mode calls this the *Racket REPL* buffer.

In both cases:

  • The REPL appears the first time you choose Run, for example by pressing F5.

  • The REPL will be “inside” the file (module) that you’re editing — meaning that things defined in the file are visible and usuable from the REPL.

In the REPL, try calling twice with various values and see what happens:

1
2
3
4
file.rkt> (twice 0)
0
file.rkt> (twice 2)
4

Looks good.

Put the examples in the source file

Some of these examples you try may be interesting. Copy them from the REPL to the source file:

1
2
3
4
5
6
7
#lang racket

(define (twice n)
  (* n 2))

(twice 0)
(twice 2)

When you run the file, you’ll see this output in the REPL:

1
2
0
4

Expressions at the top level in a file — such as these function applications — print their values when the file is run.

Fix things

There’s nothing to fix in this example. twice is behaving as we expect. But if you make fixes, it’s easy to quickly try the revised version, as we saw above.

Change the examples into unit tests

Next, simply nest your examples inside:

1
2
3
4
5
6
(module+ test
  (require rackunit)
  (check-equal? _example1_ _result1_)
  (check-equal? _example2_ _result2_)
  ;; and so on...
  )

So our example becomes:

1
2
3
4
5
6
7
8
9
#lang racket

(define (twice n)
  (* n 2))

(module+ test
  (require rackunit)
  (check-equal? (twice 0) 0)
  (check-equal? (twice 2) 4))

If I’m sure the function result value is correct, I’ll often just copy the output from the REPL into the right hand side of the check-equal? form.

But only after checking carefully that it’s correct. Nothing sucks like a unit test whose expected value came from a buggy actual output. (Not that I would make that mistake, but I have a friend who knows someone who did this once. *coughs*)


And that’s it. This code example is extremely simple, but even with more realistic examples, this is my usual workflow:

  1. Make simple functions.
  2. Try them “live” in the REPL.
  3. Accumulate some interesting examples.
  4. When things look good, “bake” the examples as unit tests.

What about debugging?

In C/C++, I was from the school that, whenever you write new code, the very first thing you do is step through it in the debugger. You don’t just hit “Go” and look at the results. Instead, you step through, line by line, and see that it’s behaving the way you want. I believe that’s a worthwhile discipline in a language where every line of your program is mutating a variable.

When I first learned Racket, I figured it would be similar. But as I showed above, that’s not really the case. Although DrRacket has a GUI debugger, and it works, I rarely use it. Instead, the interactive REPL serves much of the same purpose — especially when you’re writing small functions that don’t mutate state. You can interact with those functions, and see how they actually behave with various inputs.

Mostly that’s all I need. But when that’s not enough, a good old-fashioned printf usually suffices. It may sound too simple, but when I ask very experienced Racket developers, that’s what they do.

Now, if you find yourself sticking printfs at the start of more than a couple functinos to see the call path — there’s an easier way: (require racket/trace), then put (trace function-name) in your source or in the REPL to include a function in the trace, or (untrace function-name) to remove it.

Finally, when it comes to debugging macros, DrRacket’s macro stepper is quite brilliant. Not only does it show the expansion steps (which you can also get using expand-once and expand at the REPL), it draws errors to show the origin of identifiers and lets you inspect the full syntax objects. You may not need it for a simple macro, but when you need it, it’s a lifesaver.

  1. I lie. I wrote most of this post 8 months ago. It’s been sitting around as a draft. I finally decided to review and publish it. 

by Greg Hendershott at November 14, 2014 05:00 PM

Andy Wingo

on yakshave, on color, on cosines, on glitchen

Hold on to your butts, kids, because this is epic.

on yaks

As in all great epics, our prideful, stubborn hero starts in a perfectly acceptable state of things, decides on a lark to make a small excursion, and comes back much much later to inflict upon you pictures from his journey.

So. I have a web photo gallery but I don't take many pictures these days. Dealing with photos is a bit of a drag, and the ways that are easier like Instagram or what-not give me the (peer, corporate, government: choose 3) surveillance hives. So, I had vague thoughts that I should update my web gallery. Yakpoint 1.

At the same time, my web gallery was written for mod_python on the server, and I don't like hacking in Python any more and kinda wanted to switch away from Apache. Yakpoint 2.

So I rewrote the server-side part in Scheme. (Yakpoint 3.) It worked fine but I found I needed the ability to get the dimensions of files on the server, so I wrote a quick-and-dirty JPEG parser. Yakpoint 4.

I needed EXIF data as well, as the original version displayed EXIF data, and for that I used a binding to libexif that I had written a few years ago when I thought about starting this project (Yakpoint -1). However I found some crashers in the library, because it had never really been tested in production, and instead of fixing them I said "what the hell, I'll just write an EXIF parser". (Yakpoint 5.) So I did and adapted the web gallery to use it (Yakpoint 6, for the adaptation.)

At this point, I looked back, and looked forward, and looked all around, and all was good, but what was with this uneasiness I was feeling? And indeed, I hadn't actually made anything better, and I wasn't taking more photos, and the workflow was the same.

I was also concerned about the client side of things, which was still in Python and using some breakage-prone legacy libraries to do the photo scaling and transformations and what-not, and relied on a desktop application (f-spot) of dubious future. So I started to look at what it would take to port that script to Scheme (Yakpoint 7). Well it used some legacy libraries to copy files over SSH (gnome-vfs; switching away from that would be Yakpoint 8) and I didn't want to make a Scheme GIO binding (Yakpoint 9, narrowly avoided), and I then -- and then, dear reader -- so then I said "well WTF my caching story on the server is crap anyway, I never know when the sqlite database has changed or not so I never know what responses I can cache, what I really want is a functional datastore" (Yakpoint 10), which is what I have with Git and Tekuti (Yakpoint of yore), and so why not just store my photos in Git like I do in Tekuti for blog posts and serve them from there, indexing as needed? Of course I'd need some other server software (Yakpoint of fore, by which I meantersay the future), but then I could just git push to update my photo gallery, and I wouldn't have to endure the horror that is GVFS shelling out to ssh in a FUSE daemon (Yakpoint of ne'er).

So. After mulling over these thoughts for a while I decided, during an autumnal walk on the Salève in which we had the greatest views of Mont Blanc everrrrr and yet where are the photos?, that really what I needed was new photo management software, not just a web gallery. I should be able to share photos from my phone or from my desktop, fix them up either place, tag and such, and OK woo hoo! Such is the future! And the present for many people? Thing is, I also needed good permissions management (Yakpoint what, 10 I guess?), because you know a dude just out of college is not the same as that dude many years later. Which means serving things over HTTPS (Yakpoints 11-47) in such a way that the app has some good control over who gets what.

Well. Anyway. My mind ran ahead, and runs ahead, and yet we haven't actually tasted the awesome sauce yet. So! The photo management software, whereever it lives, needs to rotate photos at least, and scale them down to a few resolutions. I smell a yak! I looked at jpegtran which can do some lossless rotations but it's not available as a library, which is odd; and really I don't like shelling out for core program functionality, because every time I deal with the file system it's the wild west of concurrent mutation. If naming things is one of the two hardest problems in computer science, the file system is the worst because you have to give a global name to every intermediate value.

At the same time to scale images, what was I to do? Make a binding to libjpeg? Well I started (Yakpoint 48) but for reals kids, libjpeg is not fun. It works great and is really clever but

  1. it's approximately impossible to use from a dynamic ffi; you want a compiler to verify that you are using the right structure definitions

  2. there has been an inane ABI and format break imposed by the official IJG libjpeg but which other implementations have not followed, but how could you know which one you are using?

  3. the error handling facility encourages longjmp in C programs; somewhat terrifying

  4. off-heap image manipulation libraries always interact poorly with GC, because the GC only sees the small pointer to the off-heap image, and so doesn't GC often enough

  5. I have zero guarantee that libjpeg won't change ABI in weird ways, and I don't want to touch this software for the next 10 years

  6. I want to do jpegtran-like lossless transformations, but that's not available as a library, and it's totes ridics that binding libjpeg does not help you out here

  7. it's still an unsafe C library, battle-tested yes, but terrifyingly unsafe, and I'd be putting it on my server and who knows?

Friends, I arrived at the pasture, and I, I chose the yak less shaven. I took my lame JPEG parser and turned it into a full decoder (Yakpoint 49), realized it wasn't much more work to do an encoder (Yakpoint 50), and implemented the lossless transformations (Yakpoint 51).

on haters

Before we go on, I know some people would think "what is this kid about". I mean, custom gallery software, a custom JPEG library of all things, all bespoke, why don't you just use off-the-shelf solutions? Why aren't you normal and use a normal language and what about the best practices and where's your business case and I can't go on about this because there's a technical term for people that say this kind of thing and it's "hater".

Thing is, when did a hater ever make anything cool? Come to think of it, when did a hater make anything at all? In my experience the most vocal haters have nothing behind their names except a long series of pseudonymous rants in other people's comment boxes. So friends, in the joyful spirit of earning-anew, let's talk about JPEG!

on color

JPEG is a funny thing. Photos are our lives and our memories, our first steps and our friends, and yet I for one didn't know very much about them. My mental model that "a JPEG is a rectangle of pixels" doesn't turn out to be quite right.

If you actually look in a normal JPEG, you see three planes of information. If I take this image, for example:

If I decode it, actually I get three images. Here's the first one:

This is just the greyscale version of the image. So, storytime! Remember black and white television? We had an old one that got moved around the house sometimes, like if Mom was working at something in the kitchen. We also had a color one in the living room, and you could watch one or the other and they showed the same stuff. Strange when you think about it though -- one being in color and the other not. Well it turns out that color was literally just added on, both historically and technically. The main broadcast was still in black and white, and then in one part of the frequency band there were separate color signals, which color TVs would pick up, mix with the black and white signal, and come out with color. Wikipedia notes that "color TV" was really just "colored TV", which is a phrase whose cleverness I respect. Big ups to the W P.

In the context of JPEG, this black-and-white signal is sometimes called "luma", but is more precisely called Y', where the "prime" (the apostrophe) indicates that the signal has gamma correction applied.

In the image above, I replaced the color planes (sometimes collectively called the "chroma") with zeroes, while losslessly keeping the luma. Below is the first color plane, with the Y' plane replaced with a uniform 50% luma, and the other color plane replaced with zeros.

This color signal is technically known as CB, which may be very imperfectly understood as the bluish component of the color. Well the original image wasn't very blue, so we don't see very much here.

Indeed, our eyes have a harder time seeing differences in color than differences in intensity. Apparently this goes all the way down to biology -- we have more receptors in our eyes for "black and white" and fewer for color.

Early broadcasters took advantage of this difference in perception by actually devoting more bandwidth in their broadcasts to luma than to chroma; if you check the Wikipedia page you will see that the area in the spectrum allocation devoted to color is much smaller than the area devoted to intensity. So it is in JPEG: the above image being half-width indicates that actually we're just encoding one CB sample for every two Y' samples.

Finally, here we have the CR color plane, which can loosely be thought of as the "redness" of the image.

These test images and crops preserve the actual encoding of this photo as it came from my camera, without re-encoding. That's partly why there's not much interesting going on; with the megapixels these days, it's hard to fit much of anything in a few hundred pixels square. This particular camera is sub-sampling in the horizontal direction, but it's also common to subsample vertically as well, producing color planes that are half-width and half-height. In my limited investigations I have found that cameras tend to sub-sample just in the X direction, producing what they call 4:2:2 images, and that standard software encoders subsample in both, producing 4:2:0.

Incidentally, properly scaling up the color planes is quite an irritating endeavor -- the standard indicates that the color is sampled between the locations of the Y' samples ("centered" chroma), but these images originally have EXIF data that indicates that the color samples are taken at the position of the first Y' sample ("co-sited" chroma). I'm pretty sure libjpeg doesn't delve into the EXIF to check this though, so it would seem that all renderings I have seen of these photos are subtly off.

But how do you get proper color out of these strange luma and chroma things? Well, the Y'CBCR colorspace is really just the same color cube as RGB, except rotated: the Y' axis traverses the diagonal from (0, 0, 0) (black) to (255, 255, 255) (white). CB and CR are perpendicular to that diagonal, pointing towards blue or red respectively. So to go back to RGB, you multiply by a matrix to rotate the cube.

It's not a very intuitive color system, as you can see from the images above. For one thing, at zero or full luma, the chroma axes have no meaning; black and white can have no hue. Indeed if you imagine trying to fit a cube corner-down into a similar-sized box, you end up either having empty space in the box, or you have to cut off corners from the cube, or both. Cut corners means that bits of the Y'CBCR signal are wasted; empty space means there are RGB colors that are not representable in Y'CBCR. I'm not sure, but I think both are true for the particular formulation of Y'CBCR used in JPEG.

There's more to say about color here but frankly I don't know enough to do so, even though I worked in digital video for many years. If this is something you are mildly interested in, I highly, highly recommend watching Wim Taymans' presentation at this year's GStreamer conference. He takes a look at color in video that is constructive, building up from biology through math to engineering. His is a principled approach rather than a list of rules. It really clarified a number of things for me (and opened doors to unknown unknowns beyond).

on cosines

Where were we? Right, JPEG. So the proper way to understand what JPEG is is to understand the encoding process. We've covered colorspace conversion from RGB to Y'CBCR and sub-sampling. Next, the image canvas is divided into equal-sized "macroblocks". (These are called "minimum coded units" (MCUs) in the JPEG context, but in video they are usually called macroblocks, and it's a better name.) Without sub-sampling, each macro-block will contain one 8-sample-by-8-sample block for each component (Y', CB, CR) of the image. In my images above, the canvas space corresponding to one chroma block is the space of two luma blocks, so the macroblocks will be 16 samples wide and 8 samples tall, and contain two Y' blocks and one each of CB and CR. If the image canvas can't be evenly divided into macroblocks, it is padded to fit, usually by duplicating the last column or row of samples.

Then to make a JPEG, each block is encoded separately, then the whole thing is just written out to a file, and you're done!

This description glosses over a couple of important points, but it's a good big-picture view to have in mind. The pipeline goes from RGB pixels, to a padded RGB canvas, to separate Y'CBCR planes, to a possibly subsampled set of those planes, to macroblocks, to encoded macroblocks, to the file. Decoding is the reverse. It's a totally doable, comprehensible thing, and that was one of the big takeaways for me from this project. I took photography classes in high school and it was really cool to see how to shoot, develop, and print film, and this is similar in many ways. The real "film" is raw-format data, which some cameras produce, but understanding JPEG is like understanding enlargers and prints and fixer baths and such things. It's smelly and dark but pretty cool stuff.

So, how do you encode a block? Well peoples, this is a kinda cool thing. Maybe you remember from some math class that, given n uniformly spaced samples, you can always represent that series as a sum of n cosine functions of equally spaced frequencies. In each litle 8-by-8 block, that's what we do: a "forward discrete cosine transformation" (FDCT), which is just multiplying together some matrices for every point in the block. The FDCT is completely separable in the X and Y directions, so the space of 8 horizontal coefficients multiplies by the space of 8 vertical coefficients at each column to yield 64 total coefficients, which is not coincidentally the number of samples in a block.

Funny thing about those coefficients: each one corresponds to a particular horizontal and vertical frequency. We can map these out as a space of functions; for example giving a non-zero coefficient to (0, 0) in the upper-left block of a 8-block-by-8-block grid, and so on, yielding a 64-by-64 pixel representation of the meanings of the individual coefficients. That's what I did in the test strip above. Here is the luma example, scaled up without smoothing:

The upper-left corner corresponds to a frequency of 0 in both X and Y. The lower-right is a frequency of 4 "hertz", oscillating from highest to lowest value in both directions four times over the 8-by-8 block. I'm actually not sure why there are some greyish pixels around the right and bottom borders; it's not a compression artifact, as I constructed these DCT arrays programmatically. Anyway. Point is, your lover's smile, your sunny days, your raw urban graffiti, your child's first steps, all of these are reified in your photos as a sum of cosine coefficients.

The odd thing is that what is reified into your pictures isn't actually all of the coefficients there are! Firstly, because the coefficients are rounded to integers. Mathematically, the FDCT is a lossless operation, but in the context of JPEG it is not because the resulting coefficients are rounded. And they're not just rounded to the nearest integer; they are probably quantized further, for example to the nearest multiple of 17 or even 50. (These numbers seem exaggerated, but keep in mind that the range of coefficients is about 8 times the range of the original samples.)

The choice of what quantization factors to use is a key part of JPEG, and it's subjective: low quantization results in near-indistinguishable images, but in middle compression levels you want to choose factors that trade off subjective perception with file size. A higher quantization factor leads to coefficients with fewer bits of information that can be encoded into less space, but results in a worse image in general.

JPEG proposes a standard quantization matrix, with one number for each frequency (coefficient). Here it is for luma:

(define *standard-luma-q-table*
  #(16 11 10 16 24 40 51 61
    12 12 14 19 26 58 60 55
    14 13 16 24 40 57 69 56
    14 17 22 29 51 87 80 62
    18 22 37 56 68 109 103 77
    24 35 55 64 81 104 113 92
    49 64 78 87 103 121 120 101
    72 92 95 98 112 100 103 99))

This matrix is used for "quality 50" when you encode an 8-bit-per-sample JPEG. You can see that lower frequencies (the upper-left part) are quantized less harshly, and vice versa for higher frequencies (the bottom right).

(define *standard-chroma-q-table*
  #(17 18 24 47 99 99 99 99
    18 21 26 66 99 99 99 99
    24 26 56 99 99 99 99 99
    47 66 99 99 99 99 99 99
    99 99 99 99 99 99 99 99
    99 99 99 99 99 99 99 99
    99 99 99 99 99 99 99 99
    99 99 99 99 99 99 99 99))

For chroma (CB and CR) we see that quantization is much more harsh in general. So not only will we sub-sample color, we will also throw away more high-frequency color variation. It's interesting to think about, but also makes sense in some way; again in photography class we did an exercise where we shaded our prints with colored pencils, and the results were remarkable. My poor, lazy coloring skills somehow rendered leaves lifelike in different hues of green; really though, they were shades of grey, colored in imprecisely. "Colored TV" indeed.

With this knowledge under our chapeaux, we can now say what the "JPEG quality" setting actually is: it's simply that pair of standard quantization matrices scaled up or down. Towards "quality 100", the matrix approaches all-ones, for no quantization, and thus minimal loss (though you still have some rounding, often subsampling as well, and RGB-to-Y'CBCR gamut loss). Towards "quality 0" they scale to a matrix full of large values, for harsh quantization.

This understanding also explains those wavey JPEG artifacts you get on low-quality images. Those artifacts look like waves because they are waves. They usually occur at sharp intensity transitions, which like a cymbal crash cause lots of high frequencies that then get harshly quantized. Incidentally I suspect (but don't know) that this is the same reason that cymbals often sound bad in poorly-encoded MP3s, because of harsh quantization in the frequency domain.

Finally, the coefficients are written out to a file as a stream of bits. Each file gets a huffman code allocated to it, which ideally is built from the distribution of quantized coefficient sizes seen in all of the blocks of an image. There are usually different encodings for luma and chroma, to reflect their different quantizations. Reading and writing this bitstream is a bit of a headache but the algorithm is specified in the JPEG standard, and all you have to do is implement it. Notably, though, there is special support for encoding a run of zero-valued coefficients, which happens often after quantization. There are rarely wavey bits in a blue blue sky.

on transforms

It's terribly common for photos to be wrongly oriented. Unfortunately, the way that many editors fix photo rotation is by setting a bit in the EXIF information of the JPEG. This is ineffectual, as web browsers don't look in the EXIF information, and silly, because it turns out you can losslessly rotate most JPEG images anyway.

Consider that the body of a JPEG is an array of macroblocks. To rotate an image, you just have to rearrange those macroblocks, then rearrange the blocks inside the macroblocks (e.g. swap the two Y' blocks in my above example), then transform the blocks themselves.

The lossless transformations that you can do on a block are transposition, vertical flipping, and horizontal flipping.

Transposition flips a block along its downward-sloping diagonal. To do so, you just swap the coefficients at (u, v) with the coefficients at (v, u). Easy peasey.

Flipping is trickier. Consider the enlarged DCT image from above. What would it take to horizontally flip the function at (0, 1)? Instead of going from light to dark, you want it to go from dark to light. Simple: you just negate the coefficients! But you only want to negate those coefficients that are "odd" in the X direction, which are those coefficients whose column is odd. And actually that's all there is to it. Flipping vertically is the same, but for coefficients whose row is odd.

I said "most images" above because those whose size is not evenly divided by the macroblock size can't be losslessly rotated -- you will end up seeing some of the hidden data that falls off the edge of the canvas. Oh well. Most raw images are properly dimensioned, and if you're downscaling, you already have to re-encode anyway.

But that's just flipping and transposition, you say! What about rotation? Well it turns out that you can express rotation in terms of these operations: rotating 90 degrees clockwise is just a transpose and a horizontal flip (in that order). Together, flipping horizontally, flipping vertically, and transposing form a group, in the same way that flipping and flopping form a group for mattresses. Yeah!

on scheme

I wrote this library in Scheme because that's my language of choice these days. I didn't run into any serious impedance mismatches; Guile has a generic multi-dimensional array facility that made it possible to express many of these operations as generic folds, unfolds, or maps over arrays. The huffman coding part was a bit irritating, but all in all things were pretty good. The speed is pretty bad, but I haven't optimized it at all, and it gives me a nice test case for the compiler. Anyway, it's been fun and it suits my needs. Check out the project page if you're interested. Yes, to shave a yak you have to get a bit bovine and smelly, but yaks live in awesome places!

Finally I will leave you with a glitch, one of many that I have produced over the last couple weeks. Comments and corrections welcome below. Happy hacking!

by Andy Wingo at November 14, 2014 04:49 PM

Programming Praxis

Dawkins’ Weasel

In his book The Blind Watchmaker, Richard Dawkins says:

I don’t know who it was first pointed out that, given enough time, a monkey bashing away at random on a typewriter could produce all the works of Shakespeare. The operative phrase is, of course, given enough time. Let us limit the task facing our monkey somewhat. Suppose that he has to produce, not the complete works of Shakespeare but just the short sentence ‘Methinks it is like a weasel’, and we shall make it relatively easy by giving him a typewriter with a restricted keyboard, one with just the 26 (capital) letters, and a space bar. How long will he take to write this one little sentence?

(All of our quotes are shamelessly stolen from Wikipedia.) Then Dawkins suggests that this is a bad analogy for evolution, and proposes this alternative:

We again use our computer monkey, but with a crucial difference in its program. It again begins by choosing a random sequence of 28 letters, just as before … it duplicates it repeatedly, but with a certain chance of random error – ‘mutation’ – in the copying. The computer examines the mutant nonsense phrases, the ‘progeny’ of the original phrase, and chooses the one which, however slightly, most resembles the target phrase, METHINKS IT IS LIKE A WEASEL.

Then he discusses a computer program that simulates his monkey, which he says took half an hour to run because it was written BASIC; when he rewrote his program in Pascal the runtime dropped to eleven seconds. His program used this algorithm:

  1. Start with a random string of 28 characters.
  2. Make 100 copies of the string (reproduce).
  3. For each character in each of the 100 copies, with a probability of 5%, replace (mutate) the character with a new random character.
  4. Compare each new string with the target string “METHINKS IT IS LIKE A WEASEL”, and give each a score (the number of letters in the string that are correct and in the correct position).
  5. If any of the new strings has a perfect score (28), halt. Otherwise, take the highest scoring string, and go to step 2.

Your task is to write a program to simulate Dawkins’ monkey. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.


by programmingpraxis at November 14, 2014 09:00 AM

November 11, 2014

Programming Praxis

Favorite Color

A text file consists of records with fields. Each field is a name/value record of the form name: value with the field name followed by a colon, a space, and the value. Records consist of one or more fields separated by a blank line. In a particular database one field is named favoritecolor.

Your task is to write a program that determines the maximal favorite color; in other words, what color is named the most times as the favorite color. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.


by programmingpraxis at November 11, 2014 09:00 AM

November 10, 2014

Jeremy Kun

The Complexity of Communication

One of the most interesting questions posed in the last thirty years of computer science is to ask how much “information” must be communicated between two parties in order for them to jointly compute something. One can imagine these two parties living on distant planets, so that the cost of communicating any amount of information is very expensive, […]

by j2kun at November 10, 2014 03:00 PM