Planet Scheme

March 16, 2018

Programming Praxis

Mid-Term Exam

At many colleges and universities, Spring Break is approaching soon, and that means mid-term exams are also imminent. Here are two questions suitable for a mid-term exam for not-too-advanced students:

First: You are given two strings, say “aet6ukm” and “123678”; neither is necessarily sorted. You are to find the first character in the first string that also appears in the second string, and return the index of the character in the second string. For the two strings above, the character “6” appears in the first string and also in the second string, at index position 3 (counting from zero), so your program should return 3.

Second: You are given a list of recipes, where each recipe is a list with a name in the first position of the list and a list of ingredients in the remaining positions of the list; for instance, (“Pasta” “Spaghetti” “Tomato sauce” “Basil”) is a simple recipe for pasta. Your program should return a list of all ingredients that are used in more than one recipe, with a list of recipe names attached to each ingredient.

Your task is to answer the two mid-term exam questions given above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at March 16, 2018 09:00 AM

March 13, 2018

Programming Praxis

Nearest Pair

Today’s exercise is an interview question:

You are given a list of integers and must find the nearest pair that sum to a given target. For instance, given the list (1 5 3 6 4 2), if the target is 7, there are three pairs with the required sum, (1 6), (5 2) and (3 4), but pair (1 6) has two intervening numbers, pair (5 2) has three intervening numbers, and pair (3 4) has only one intervening number, so (3 4) is the nearest pair.

Your task is to write a program to find the nearest pair. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at March 13, 2018 09:00 AM

March 09, 2018

Programming Praxis

Array Rotation, Timing Tests

We have been looking at Section 2.3 of Jon Bentley’s book Programming Pearls in the last two exercises, and have implemented his “juggling” and “block swap” algorithms. Bentley also discusses a third algorithm, which he calls the “reversal” algorithm, and which we implemented several years ago. Bentley goes on to give timing comparisons between the three algorithms.

Your task is to generate timing comparisons similar to Bentley’s, to see what happens with your system, your language and your compiler. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at March 09, 2018 10:00 AM

March 06, 2018

Programming Praxis

Array Rotation, Again

In Section 2.3 of his book Programming Pearls, Jon Bentley gives three O(n) algorithms for rotating an array. We looked at the first algorithm in the previous exercise; today we look at the second algorithm:

A different algorithm results from a different view of the problem: rotating the vector x is really just swapping the two segments of the vector ab to be the vector ba, where a represents the first i elements of x. Suppose a is shorter than b. Divide b into bl and br, so that br is the same length as a. Swap a and br to transform abrbr into brbla. The sequence a is in its final place, so we can focus on swapping the two parts of b. Since this new problem has the same form as the original, we can solve it recursively. This algorithm can lead to an elegant program, but it requires delicate code and some thought to see that it is efficient enought.

As before, Bentley challenges us to implement the rotation algorithm and he gives a cryptic hint: “How does the greatest common divisor of i and n appear in the program?”

Your task is to implement the array rotation algorithm described above. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at March 06, 2018 10:00 AM

March 02, 2018

Programming Praxis

Array Rotation

I’ve been re-reading Jon Bentley’s book Programming Pearls. In Chapter 2, Section 2.3, Bentley discusses the problem of rotating the elements of an array (for instance, rotate the array abcdefgh three positions left to defghabc) in time proportional to the length of the array using only a small, constant amount of extra space, and he gives three algorithms for doing so. Today’s exercise discusses the first. Here’s Bentley’s description:

One successful approach is a delicate juggling act; move x[0] to the temporary t, then move x[i] to x[0], x[2i] to x[i], and so on (taking all indices into x modulo n), until we come back to taking an element from x[0], at which point we instead take the element from t and stop the process. If that process didn’t move all the elements, then we start over at x[1], and continue until we move all the elements.

Then Bentley challenges us to implement the rotation algorithm, and he gives a cryptic hint: “How does the greatest common divisor of i and n appear in the program?”

Your task is to implement Bentley’s array rotation algorithm. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at March 02, 2018 10:00 AM

February 27, 2018

Programming Praxis

Seven-Segment Devices

We have today a simple exercise from Jon Bentley’s book Programming Pearls, Chapter 3, Problem 8:

[S. C. Johnson] Seven-segment devices provide an inexpensive display of the ten decimal digits:

 -----           -----   -----           -----   -----   -----   -----   -----
|     |       |       |       | |     | |       |             | |     | |     |
|     |       |       |       | |     | |       |             | |     | |     |
|     |       |       |       | |     | |       |             | |     | |     |
                 -----   -----   -----   -----   -----           -----   -----
|     |       | |             |       |       | |     |       | |     |       |
|     |       | |             |       |       | |     |       | |     |       |
|     |       | |             |       |       | |     |       | |     |       |
 -----           -----   -----           -----   -----           -----   -----

The seven segments are usually numbered as:

|     |
3     4
|     |
|     |
5     6
|     |

Write a program that displays a 16-bit positive integer in five seven-segment digits. The output is an array of five bytes; bit i of byte j is one if and only if the ith segment of digit j should be on.

It was harder to type those digits than it looks.

Your task is to write a program to display numbers using seven-segment digits, as Bentley directs. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at February 27, 2018 10:00 AM

February 23, 2018

Programming Praxis

I Think I’m Crazy!

In a recent exercise I wrote about a shell script I am writing at work. Development of the shell script continues, as I work my way through the various requirements of the script. At one point, I decided to use Awk to process a piece of the task, and I had to look something up in the Awk manual, and while I was there I noticed that Gawk has a -M option to use gmp for the arithmetic, giving Awk a big-integer capability. So it occurred to me: awk big integers … prime numbers … awk big integers … prime numbers … and so a project was born.

I decided to write a prime number generator in Awk, based on this previous exercise. Never mind that Awk doesn’t provide iterators, I ought to be able to figure this out. And I did; the result is on the next page. For the record, it works. But it’s horribly slow; generating the 78498 primes less than a million takes about four-and-a-half seconds, which is at least four seconds too long. Am I crazy?

Your task is to tell us about something crazy you have done, writing a program in a language that doesn’t provide what you need, either because you were forced to do so by circumstances or because you wanted to be crazy, as I did. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at February 23, 2018 10:00 AM

February 20, 2018

Programming Praxis

N-Gram Frequency

Given a string of characters and an n-gram size n, prepare a frequency table of all combinations of size n blocks of characters. For instance, with the string “Programming Praxis” and an n-gram size of 2, the n-grams are “Pr”, “ro”, “og”, “gr”, “ra”, “am”, “mm”, “mi”, “in”, “ng”, “g “, ” P”, “Pr”, “ra”, “ax”, “xi”, and “is”; the two n-grams “Pr” and “ra” appear twice, and all the others appear once.

Your task is to write a program that computes a frequency table of n-grams. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at February 20, 2018 10:00 AM

February 19, 2018

GNU Guix

Join GNU Guix through Outreachy or GSoC

We are happy to announce that for the first time this year, GNU Guix offers a three-month internship through Outreachy, the inclusion program for groups traditionally underrepresented in free software and tech. We currently propose two subjects to work on:

  1. improving the user experience for the guix package command-line tool;
  2. enhancing Guile tools for the Guix package manager.

Eligible persons should apply by March 22nd.

Guix also participates in the Google Summer of Code (GSoC), under the aegis of the GNU Project. We have collected project ideas for Guix, GuixSD, and the GNU Shepherd, covering a range of topics. The list is far from exhaustive, so feel free to bring your own!

If you are an eligible student, make sure to apply by March 27th.

If you’d like to contribute to computing freedom, Scheme, functional programming, or operating system development, now is a good time to join us. Let’s get in touch on the mailing lists and on the #guix channel on the Freenode IRC network!

About GNU Guix

GNU Guix is a transactional package manager for the GNU system. The Guix System Distribution or GuixSD is an advanced distribution of the GNU system that relies on GNU Guix and respects the user&aposs freedom.

In addition to standard package management features, Guix supports transactional upgrades and roll-backs, unprivileged package management, per-user profiles, and garbage collection. Guix uses low-level mechanisms from the Nix package manager, except that packages are defined as native Guile modules, using extensions to the Scheme language. GuixSD offers a declarative approach to operating system configuration management, and is highly customizable and hackable.

GuixSD can be used on an i686, x86_64 and armv7 machines. It is also possible to use Guix on top of an already installed GNU/Linux system, including on mips64el and aarch64.

by Ludovic Courtès at February 19, 2018 04:00 PM

February 16, 2018

Programming Praxis

Perfect Power Sequence

I discovered the Numberphile channel on YouTube a few months ago, and have greatly enjoyed its videos at the intersection of mathematics and programming. Their recent video discusses the Catalan Conjecture:

In the infinite sequence of perfect powers 1, 4, 8, 9, 16, 25, 32, 36, 49, 64, 81, 100, …, the only two adjacent numbers are 2³ = 8 and 3² = 9.

The conjecture has recently been proved, as discussed at Numberphile.

Your task is to write a program that generates the perfect power sequence. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at February 16, 2018 10:00 AM

February 13, 2018

Programming Praxis

The Early Bird Catches The Worm

Today we have another homework problem:

What is the earliest time of day (using a 24-hour clock) that can be written with four given digits? For instance, given the digits 1, 2, 3, and 4, times like 14:32 and 23:14 can be written, but the earliest time is 12:34. Your program should report that no time can be formed with digits 6, 7, 8 and 9.

Your task is to find the earliest time of day that can be written by four digits. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at February 13, 2018 10:00 AM

February 09, 2018

Programming Praxis

Partial List Reversal

We’re a few weeks into the beginning of a new school term, and the beginning-programmer message boards are swelling with questions from new students. This question caught my eye; I imagine it’s from a second-semester programming student who learned C in the first semester and is now taking a class in data structures:

Write a program that, given a linked list and two integer indices, reverses the portion of the list between the two indices. For instance, reversing the list [0,1,2,3,4,5,6,7,8,9] between indices 3 (inclusive) and 6 (exclusive) yields the list [0,1,2,5,4,3,6,7,8,9].

Your task is to help the student by writing a program to reverse a portion of a list. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at February 09, 2018 10:00 AM

February 07, 2018

Andy Wingo

design notes on inline caches in guile

Ahoy, programming-language tinkerfolk! Today's rambling missive chews the gnarly bones of "inline caches", in general but also with particular respect to the Guile implementation of Scheme. First, a little intro.

inline what?

Inline caches are a language implementation technique used to accelerate polymorphic dispatch. Let's dive in to that.

By implementation technique, I mean that the technique applies to the language compiler and runtime, rather than to the semantics of the language itself. The effects on the language do exist though in an indirect way, in the sense that inline caches can make some operations faster and therefore more common. Eventually inline caches can affect what users expect out of a language and what kinds of programs they write.

But I'm getting ahead of myself. Polymorphic dispatch literally means "choosing based on multiple forms". Let's say your language has immutable strings -- like Java, Python, or Javascript. Let's say your language also has operator overloading, and that it uses + to concatenate strings. Well at that point you have a problem -- while you can specify a terse semantics of some core set of operations on strings (win!), you can't choose one representation of strings that will work well for all cases (lose!). If the user has a workload where they regularly build up strings by concatenating them, you will want to store strings as trees of substrings. On the other hand if they want to access characterscodepoints by index, then you want an array. But if the codepoints are all below 256, maybe you should represent them as bytes to save space, whereas maybe instead as 4-byte codepoints otherwise? Or maybe even UTF-8 with a codepoint index side table.

The right representation (form) of a string depends on the myriad ways that the string might be used. The string-append operation is polymorphic, in the sense that the precise code for the operator depends on the representation of the operands -- despite the fact that the meaning of string-append is monomorphic!

Anyway, that's the problem. Before inline caches came along, there were two solutions: callouts and open-coding. Both were bad in similar ways. A callout is where the compiler generates a call to a generic runtime routine. The runtime routine will be able to handle all the myriad forms and combination of forms of the operands. This works fine but can be a bit slow, as all callouts for a given operator (e.g. string-append) dispatch to a single routine for the whole program, so they don't get to optimize for any particular call site.

One tempting thing for compiler writers to do is to effectively inline the string-append operation into each of its call sites. This is "open-coding" (in the terminology of the early Lisp implementations like MACLISP). The advantage here is that maybe the compiler knows something about one or more of the operands, so it can eliminate some cases, effectively performing some compile-time specialization. But this is a limited technique; one could argue that the whole point of polymorphism is to allow for generic operations on generic data, so you rarely have compile-time invariants that can allow you to specialize. Open-coding of polymorphic operations instead leads to code bloat, as the string-append operation is just so many copies of the same thing.

Inline caches emerged to solve this problem. They trace their lineage back to Smalltalk 80, gained in complexity and power with Self and finally reached mass consciousness through Javascript. These languages all share the characteristic of being dynamically typed and object-oriented. When a user evaluates a statement like x = y.z, the language implementation needs to figure out where y.z is actually located. This location depends on the representation of y, which is rarely known at compile-time.

However for any given reference y.z in the source code, there is a finite set of concrete representations of y that will actually flow to that call site at run-time. Inline caches allow the language implementation to specialize the y.z access for its particular call site. For example, at some point in the evaluation of a program, y may be seen to have representation R1 or R2. For R1, the z property may be stored at offset 3 within the object's storage, and for R2 it might be at offset 4. The inline cache is a bit of specialized code that compares the type of the object being accessed against R1 , in that case returning the value at offset 3, otherwise R2 and offset r4, and otherwise falling back to a generic routine. If this isn't clear to you, Vyacheslav Egorov write a fine article describing and implementing the object representation optimizations enabled by inline caches.

Inline caches also serve as input data to later stages of an adaptive compiler, allowing the compiler to selectively inline (open-code) only those cases that are appropriate to values actually seen at any given call site.

but how?

The classic formulation of inline caches from Self and early V8 actually patched the code being executed. An inline cache might be allocated at address 0xcabba9e5 and the code emitted for its call-site would be jmp 0xcabba9e5. If the inline cache ended up bottoming out to the generic routine, a new inline cache would be generated that added an implementation appropriate to the newly seen "form" of the operands and the call-site. Let's say that new IC (inline cache) would have the address 0x900db334. Early versions of V8 would actually patch the machine code at the call-site to be jmp 0x900db334 instead of jmp 0xcabba6e5.

Patching machine code has a number of disadvantages, though. It inherently target-specific: you will need different strategies to patch x86-64 and armv7 machine code. It's also expensive: you have to flush the instruction cache after the patch, which slows you down. That is, of course, if you are allowed to patch executable code; on many systems that's impossible. Writable machine code is a potential vulnerability if the system may be vulnerable to remote code execution.

Perhaps worst of all, though, patching machine code is not thread-safe. In the case of early Javascript, this perhaps wasn't so important; but as JS implementations gained parallel garbage collectors and JS-level parallelism via "service workers", this becomes less acceptable.

For all of these reasons, the modern take on inline caches is to implement them as a memory location that can be atomically modified. The call site is just jmp *loc, as if it were a virtual method call. Modern CPUs have "branch target buffers" that predict the target of these indirect branches with very high accuracy so that the indirect jump does not become a pipeline stall. (What does this mean in the face of the Spectre v2 vulnerabilities? Sadly, God only knows at this point. Saddest panda.)

cry, the beloved country

I am interested in ICs in the context of the Guile implementation of Scheme, but first I will make a digression. Scheme is a very monomorphic language. Yet, this monomorphism is entirely cultural. It is in no way essential. Lack of ICs in implementations has actually fed back and encouraged this monomorphism.

Let us take as an example the case of property access. If you have a pair in Scheme and you want its first field, you do (car x). But if you have a vector, you do (vector-ref x 0).

What's the reason for this nonuniformity? You could have a generic ref procedure, which when invoked as (ref x 0) would return the field in x associated with 0. Or (ref x 'foo) to return the foo property of x. It would be more orthogonal in some ways, and it's completely valid Scheme.

We don't write Scheme programs this way, though. From what I can tell, it's for two reasons: one good, and one bad.

The good reason is that saying vector-ref means more to the reader. You know more about the complexity of the operation and what side effects it might have. When you call ref, who knows? Using concrete primitives allows for better program analysis and understanding.

The bad reason is that Scheme implementations, Guile included, tend to compile (car x) to much better code than (ref x 0). Scheme implementations in practice aren't well-equipped for polymorphic data access. In fact it is standard Scheme practice to abuse the "macro" facility to manually inline code so that that certain performance-sensitive operations get inlined into a closed graph of monomorphic operators with no callouts. To the extent that this is true, Scheme programmers, Scheme programs, and the Scheme language as a whole are all victims of their implementations. JavaScript, for example, does not have this problem -- to a small extent, maybe, yes, performance tweaks and tuning are always a thing but JavaScript implementations' ability to burn away polymorphism and abstraction results in an entirely different character in JS programs versus Scheme programs.

it gets worse

On the most basic level, Scheme is the call-by-value lambda calculus. It's well-studied, well-understood, and eminently flexible. However the way that the syntax maps to the semantics hides a constrictive monomorphism: that the "callee" of a call refer to a lambda expression.

Concretely, in an expression like (a b), in which a is not a macro, a must evaluate to the result of a lambda expression. Perhaps by reference (e.g. (define a (lambda (x) x))), perhaps directly; but a lambda nonetheless. But what if a is actually a vector? At that point the Scheme language standard would declare that to be an error.

The semantics of Clojure, though, would allow for ((vector 'a 'b 'c) 1) to evaluate to b. Why not in Scheme? There are the same good and bad reasons as with ref. Usually, the concerns of the language implementation dominate, regardless of those of the users who generally want to write terse code. Of course in some cases the implementation concerns should dominate, but not always. Here, Scheme could be more flexible if it wanted to.

what have you done for me lately

Although inline caches are not a miracle cure for performance overheads of polymorphic dispatch, they are a tool in the box. But what, precisely, can they do, both in general and for Scheme?

To my mind, they have five uses. If you can think of more, please let me know in the comments.

Firstly, they have the classic named property access optimizations as in JavaScript. These apply less to Scheme, as we don't have generic property access. Perhaps this is a deficiency of Scheme, but it's not exactly low-hanging fruit. Perhaps this would be more interesting if Guile had more generic protocols such as Racket's iteration.

Next, there are the arithmetic operators: addition, multiplication, and so on. Scheme's arithmetic is indeed polymorphic; the addition operator + can add any number of complex numbers, with a distinction between exact and inexact values. On a representation level, Guile has fixnums (small exact integers, no heap allocation), bignums (arbitrary-precision heap-allocated exact integers), fractions (exact ratios between integers), flonums (heap-allocated double-precision floating point numbers), and compnums (inexact complex numbers, internally a pair of doubles). Also in Guile, arithmetic operators are a "primitive generics", meaning that they can be extended to operate on new types at runtime via GOOPS.

The usual situation though is that any particular instance of an addition operator only sees fixnums. In that case, it makes sense to only emit code for fixnums, instead of the product of all possible numeric representations. This is a clear application where inline caches can be interesting to Guile.

Third, there is a very specific case related to dynamic linking. Did you know that most programs compiled for GNU/Linux and related systems have inline caches in them? It's a bit weird but the "Procedure Linkage Table" (PLT) segment in ELF binaries on Linux systems is set up in a way that when e.g. is loaded, the dynamic linker usually doesn't eagerly resolve all of the external routines that uses. The first time that calls frobulate, it ends up calling a procedure that looks up the location of the frobulate procedure, then patches the binary code in the PLT so that the next time frobulate is called, it dispatches directly. To dynamic language people it's the weirdest thing in the world that the C/C++/everything-static universe has at its cold, cold heart a hash table and a dynamic dispatch system that it doesn't expose to any kind of user for instrumenting or introspection -- any user that's not a malware author, of course.

But I digress! Guile can use ICs to lazily resolve runtime routines used by compiled Scheme code. But perhaps this isn't optimal, as the set of primitive runtime calls that Guile will embed in its output is finite, and so resolving these routines eagerly would probably be sufficient. Guile could use ICs for inter-module references as well, and these should indeed be resolved lazily; but I don't know, perhaps the current strategy of using a call-site cache for inter-module references is sufficient.

Fourthly (are you counting?), there is a general case of the former: when you see a call (a b) and you don't know what a is. If you put an inline cache in the call, instead of having to emit checks that a is a heap object and a procedure and then emit an indirect call to the procedure's code, you might be able to emit simply a check that a is the same as x, the only callee you ever saw at that site, and in that case you can emit a direct branch to the function's code instead of an indirect branch.

Here I think the argument is less strong. Modern CPUs are already very good at indirect jumps and well-predicted branches. The value of a devirtualization pass in compilers is that it makes the side effects of a virtual method call concrete, allowing for more optimizations; avoiding indirect branches is good but not necessary. On the other hand, Guile does have polymorphic callees (generic functions), and call ICs could help there. Ideally though we would need to extend the language to allow generic functions to feed back to their inline cache handlers.

Finally, ICs could allow for cheap tracepoints and breakpoints. If at every breakable location you included a jmp *loc, and the initial value of *loc was the next instruction, then you could patch individual locations with code to run there. The patched code would be responsible for saving and restoring machine state around the instrumentation.

Honestly I struggle a lot with the idea of debugging native code. GDB does the least-overhead, most-generic thing, which is patching code directly; but it runs from a separate process, and in Guile we need in-process portable debugging. The debugging use case is a clear area where you want adaptive optimization, so that you can emit debugging ceremony from the hottest code, knowing that you can fall back on some earlier tier. Perhaps Guile should bite the bullet and go this way too.

implementation plan

In Guile, monomorphic as it is in most things, probably only arithmetic is worth the trouble of inline caches, at least in the short term.

Another question is how much to specialize the inline caches to their call site. On the extreme side, each call site could have a custom calling convention: if the first operand is in register A and the second is in register B and they are expected to be fixnums, and the result goes in register C, and the continuation is the code at L, well then you generate an inline cache that specializes to all of that. No need to shuffle operands or results, no need to save the continuation (return location) on the stack.

The opposite would be to call ICs as if their were normal procedures: shuffle arguments into fixed operand registers, push a stack frame, and when the IC returns, shuffle the result into place.

Honestly I am looking mostly to the simple solution. I am concerned about code and heap bloat if I specify to every last detail of a call site. Also maximum speed comes with an adaptive optimizer, and in that case simple lower tiers are best.

sanity check

To compare these impressions, I took a look at V8's current source code to see where they use ICs in practice. When I worked on V8, the compiler was entirely different -- there were two tiers, and both of them generated native code. Inline caches were everywhere, and they were gnarly; every architecture had its own implementation. Now in V8 there are two tiers, not the same as the old ones, and the lowest one is a bytecode interpreter.

As an adaptive optimizer, V8 doesn't need breakpoint ICs. It can always deoptimize back to the interpreter. In actual practice, to debug at a source location, V8 will patch the bytecode to insert a "DebugBreak" instruction, which has its own support in the interpreter. V8 also supports optimized compilation of this operation. So, no ICs needed here.

Likewise for generic type feedback, V8 records types as data rather than in the classic formulation of inline caches as in Self. I think WebKit's JavaScriptCore uses a similar strategy.

V8 does use inline caches for property access (loads and stores). Besides that there is an inline cache used in calls which is just used to record callee counts, and not used for direct call optimization.

Surprisingly, V8 doesn't even seem to use inline caches for arithmetic (any more?). Fair enough, I guess, given that JavaScript's numbers aren't very polymorphic, and even with a system with fixnums and heap floats like V8, floating-point numbers are rare in cold code.

The dynamic linking and relocation points don't apply to V8 either, as it doesn't receive binary code from the internet; it always starts from source.

twilight of the inline cache

There was a time when inline caches were recommended to solve all your VM problems, but it would seem now that their heyday is past.

ICs are still a win if you have named property access on objects whose shape you don't know at compile-time. But improvements in CPU branch target buffers mean that it's no longer imperative to use ICs to avoid indirect branches (modulo Spectre v2), and creating direct branches via code-patching has gotten more expensive and tricky on today's targets with concurrency and deep cache hierarchies.

Besides that, the type feedback component of inline caches seems to be taken over by explicit data-driven call-site caches, rather than executable inline caches, and the highest-throughput tiers of an adaptive optimizer burn away inline caches anyway. The pressure on an inline cache infrastructure now is towards simplicity and ease of type and call-count profiling, leaving the speed component to those higher tiers.

In Guile the bounded polymorphism on arithmetic combined with the need for ahead-of-time compilation means that ICs are probably a code size and execution time win, but it will take some engineering to prevent the calling convention overhead from dominating cost.

Time to experiment, then -- I'll let y'all know how it goes. Thoughts and feedback welcome from the compilerati. Until then, happy hacking :)

by Andy Wingo at February 07, 2018 03:14 PM

February 06, 2018

Programming Praxis

Unix Command-Line Utilities And Shell

I have mentioned previously that I work for a community college, part of a team maintaining an enterprise-wide database system on Oracle and Unix. My current assignment involves a file-transfer program between us and the Department of Education (they don’t use sftp like the rest of the world), and I am writing in the Posix shell language because that is the only language that can both be called from our user interface and run the program that the Department of Education requires. It’s not a big program, under five hundred lines, but there’s lots going on. Here are some examples:

  • Text file database: The Department of Education sends us a file of fixed-length records terminated by carriage-returns and newlines containing ascii text that acts as a database for holding configuration metadata. I read the database by selecting the desired record with grep and extracting the needed field with cut -c.
  • Strip file headers/trailers: Files received from the Department of Education have header and trailer lines added to the data. I strip those lines with an ed script:
    ed $FILENAME <
  • Arithmetic: At one point during the development of the program I needed to do some arithmetic, nothing fancy. That requirement has now gone away, but at the time I used bc to do the arithmetic, passing input using shell variables and returning output to a shell variable. And I couldn’t resist; the solutions page has a factoring program written in bc.
  • Oracle database: I use SQL*Plus to insert and query records in the Oracle database.
  • Shell built-ins: I use many of the built-in shell commands. If and test allow me to execute commands conditionally. While and do let me do loops. Cp and mv let me get things where they belong. Chown and chmod let me control the security of my data. Read lets me index through a file line-by-line. Shell variables let me parameterize the code. Shell functions let me modularize the code.

I’m not the first person to remark that having a single unifying datatype — ascii text in delimited files — and an assortment of programs to operate on that datatype makes a wonderfully useful system environment.

Your task is to tell us about your use of unix command-line utilities and shell scripts; hopefully other readers will be inspired by something interesting that you have done. When you are finished, you are welcome to read a suggested solution, or to discuss the exercise in the comments below.

by programmingpraxis at February 06, 2018 10:00 AM

February 02, 2018

Programming Praxis


Folds, in the parlance of functional programming, are a way to convert lists to a value of some other type; a fold applies a function pair-wise to each element of a list and an accumulator, then returns the accumulator when the list is exhausted. The fundamental fold is foldr:

foldr f a [x1, x2, ..., xn] = f x1 (f x2 (... (f xn a)...))

Here, f is a function with two arguments, a is an initial value, and [x1, x2, ..., xn] is the input list. The name foldr stands for “fold right”, because the parentheses stack on the right side of the expansion, the items in the list are processed right-to-left, and the accumulator is on the right side of the binary function. Foldl is similar:

foldl f a [x1, x2, ..., xn] = (...((f a x1) x2) ... xn)

The arguments have the same meaning, with “fold left” referring to the fact that the parentheses stack on the left, the items in the list are processed left-to-right, and the accumulator is on the left side of the binary function. Note that foldl and foldr have different types, because the binary function takes its arguments in opposite order. In some cases, that makes a difference; for instance, when f is cons, you must use foldr. But when the function is associative, such as +, you can use either foldl or foldr. Here are some examples:

foldr + 0 [1,2,3,4] → 10
foldl + 0 [1,2,3,4] → 10
foldr cons [] [1,2,3,4] → [1,2,3,4]
foldl cons [] [1,2,3,4] → [[[[[],1],2],3],4]
foldr plusone 0 [1,2,3,4] → 4
foldl snoc [] [1,2,3,4] → [4,3,2,1]

Sometimes there is no obvious starting value. For instance, if you want to find the maximum item in a list, there is no “guaranteed to be less than anything else” value to use for a. In that case you can use the foldl1 and foldr1 variants that take the first item in the list as the initial value. Here, max is a binary function that takes two numbers and returns the larger; it is applied pair-wise at each item in the list (we ignore the fact that the built-in max can take more than two arguments):

foldr1 max [1,2,3,4] → 4
foldl1 min [1,2,3,4] → 1

Related to foldl is scan, which applies foldl to every initial segment of a list:

scan f a [x1, x2, ..., xn] = [a, f(a, x1), f(f(a, x1), x2), ..., f(f(f(a, x1), x2), x3)]

For instance:

scan + 0 [1,2,3,4] → [0,1,3,6,10]
scan snoc [] [1,2,3,4] → [[], [1], [2,1],[3,2,1],[4,3,2,1]]

Your task is to implement all the various folds shown above; if your language provides them natively, you should re-implement them from first principles. When you are finished, you are welcome to read or run a suggested solution, or to post your own solution or discuss the exercise in the comments below.

by programmingpraxis at February 02, 2018 10:00 AM