Last update: January 4, 2010

XKCD cartoon: The Lisp Cycles

Welcome to the haskeem web site!

What's haskeem? haskeem is pretty much Scheme in Haskell... however, if you're looking for an industrial-strength Scheme interpreter, this might not be it! It doesn't implement the full Scheme number tower, it's not properly tail-recursive (but see below), it doesn't have full R6RS hygienic macros (but defmacro is present; see below), it doesn't have call-with-current-continuation (but, yet again, see below; I'm working on delimited continuations, reset/shift et al, and they are coming along tolerably well). If you're a real schemer, you're probably snickering by now... that's ok! haskeem is a toy that's letting me learn more about both Scheme and Haskell, it's a lot of fun to work on, it's actually turning into a useful tool, and all of that's plenty enough for me! I arbitrarily called the first version I put onto the web "0.5"; I'm now at 0.7.16, which is available here, almost all under GPL — I put some of the scheme code I've been writing under BSD3. But see also my comments below on doing it yourself: you'll learn far more if you go through Jonathan Tang's tutorial Write Yourself a Scheme in 48 Hours and write all of this yourself, rather than just grabbing what I've got here and looking at it.

What does it have?

What bugs do I know about? At this point I know of one bug remaining in stuff that's at least partially implemented: floating-point exponentiation expt doesn't deal quite correctly with infinities and NaN. This is tricky to get right in all cases. I think GHC is just following the C library function pow, which I think doesn't quite get it right all the time — for example, pow says that NaN0 should be 1, whereas I think it would be better to return NaN. There are other similar cases. I've tried to make haskeem be more conservative than pow, but I'm not sure I've got all possibilities covered (or, indeed, correct).

There is also a limitation present in the reset/shift continuations: they redo too much of the captured context. This is an issue which I've not addressed at all yet.

If you have questions or comments about the stuff here, please contact me at korg@korgwal.com or at uhollerbach@gmail.com.

Enjoy! -- Uwe Hollerbach


January 4, 2010

Ugh, got too busy with other stuff, and also got stuck on continuations... I'm still stuck on the continuations, but in the meantime I've cleaned up a huge number of ghc -Wall warnings and hlint suggestions. There's still a little bit to be done, but it's starting to get fairly clean...


August 13, 2009

The switch to EEnv proved very useful, as hoped: today I hacked in the new stuff required to make shift in a separate function work, and it was quite simple and straightforward. It is still not working completely, there are still naming issues to be dealt with, but it's a significant step forward: the following now works as desired

    (define fn (lambda () (shift k (cons 3 (k (k (cons 4 '())))))))
    (display (cons 1 (reset (cons 2 (fn)))))
    (newline)

should produce (1 3 2 2 4), and indeed it does. What doesn't yet work is the following:

    (define f1 (lambda (v1 v2) (shift k1 (k1 (display v1)) (k1 (display v2)))))
    (define f2 (lambda (v1 v2) (shift k2 (k2 (display v1)) (k2 (display v2)))))
    (define f3 (lambda (v1 v2) (shift k3 (k3 (display v1)) (k3 (display v2)))))
    (reset (f1 1 2) (f2 3 4) (f3 5 6))
    (newline)

    (define fn (lambda (v1 v2) (shift k (k (display v1)) (k (display v2)))))
    (reset (fn 1 2) (fn 3 4) (fn 5 6))
    (newline)

Both of these should produce 13564562356456 according to mzscheme, but only the first does; the second one prints just 12. I think the issue is that the continuation k is visible too far, so that when the second call to fn becomes active, it sees the continuation of the first call, and haskeem gets a little confused. I'll have to do a little digging to figure out what's going on there.

Those previous examples which don't have this naming confusion now work fine: the two from the 10th, as well as an example using the call/cc from the 8th, are all running quite spiffingly, at least for what I've tried so far.

Version bump to 0.7.15.


August 11, 2009

No changes at all, other than the transformation of evalLisp etc to take a single EEnv instead of Env, [[LispVal]], and Integer. That's enough of a change for today... it even warrants a version bump, to 0.7.14.


August 10, 2009

Good news, everyone! I have many more failing test cases for reset/shift! Following the initial translation into haskell, the next thing I tried was to make some tests with the shift in another function, not just a macro as in the factorial list comprehension test; for example:

    (define fn (lambda () (shift k 3)))
    (display (+ 1 (reset (+ 2 (fn)))))
    (newline)

    (define fn (lambda () (shift k (+ 3 (k 4)))))
    (display (+ 1 (reset (+ 2 (fn)))))
    (newline)

The first of these works very well; the second, not so much: it crashes haskeem quite quickly with a stack space overflow. After adding debugging stuff to print out some intermediate variable bindings, it became clear what's going on. Essentially, I was attempting to abuse the scoping of the interpreter. What I need here is basically dynamic scope, but because scheme is statically scoped, what I wanted to do won't work. So I have to, more or less, provide a second environment in which the arguments of the continuation functions are appropriately bound, and specially mark the continuation functions so that they, and only they, update this second environment during processing. Then the shift form can do what I wanted, which is to check whether its unique symbol is defined in that environment and use this as the test for what to do next.

This is very tedious, because I basically have to thread (yet) another argument through 100+ instances of calls to the evalLisp function. So I'm not going to do that. Instead, I'm going to merge the existing environment, context-stack, and quote-level arguments into one new datatype EEnv (for extended environment) and rewrite all instances of evalLisp to use this. This will be even more painful than threading the new argument, but I'll have to do it only once: after that, I can just stick new stuff into the EEnv, and it'll be available where I need it. Most instances will just pass it through unchanged, so that I can just concentrate on the few cases where something actually needs to be done. This is very much akin to the State monad, except I'm being a little more explicit.

So: no updates to haskeem for a little while! I've got a bunch of trivial stuff already done, but changing the evalLisp function touches many many lines in evaluator.hs, so it'll take a day or two to get it all right, and then I will still have to add the new dynamic stuff.

Oh... and this answers the question from Saturday of whether that call/cc works: the answer is no, because it's implemented as a separate function.


August 8, 2009

I have changed the names of the operators inside delcont.scm: I've appended "/m" to all of them, so as not to clash with the native operators, which I'm now renaming to their proper lowercase names. For the time being, I've also withdrawn the reset0/shift0, prompt/control, and prompt0/control0 operators: I thought some more about how I had implemented the reset/shift stuff, and it seems to me that there are some fundamental issues there which make it trickier than I had thought to re-use that for the other operators. In particular, the other operators can capture more of their surrounding context, and I think my code can't do that correctly yet. It may be that the same issues are present in the macro versions, and that this is causing some of those tests to fail. I'll have to look into that.

This is another version bump, to 0.7.13, since I did the name changes.

One thing to investigate with this is a slightly-limited version of call/cc: Queinnec lists the following as an implementation of call/cc, for programs P which do not use reset/shift, and with the continuations that are produced limited to the scope of the reset. If my implementation of reset/shift can do this properly, that would already be quite fine.

    (let ()
      (define (call/cc f)
        (shift pc (pc (f (lambda (v)
                           (shift ignore-pc (pc v)))))))
      (reset P))

August 7, 2009

W00t! Native continuations, here we come! It's still not working quite correctly for all four versions of the operators, but the reset/shift version is working correctly for all examples, except for the known issue that 1 3 2 3 comes out as 1 1 3 2 1 3 (plus the other instances of this bug). Since I've still not looked into this at all, I don't feel too bad about it. In principle the other pairs of operators are expressible in terms of reset/shift, so it might be that all four sets are "done"; but I still want to see if I can get them working properly "natively".

This definitely warrants an update to the tarball, and a version bump, too: 0.7.12 now.

later... I forgot to check one thing: is it still necessary to have the reset/shift all in one s-expr? The way I wrote the native haskell code, it should not be, and so the factorial example which I stole from the famous Oleg should now work cleanly:

    (defmacro (range from step to)
      (let ((f (gensym))
            (i (gensym)))
        `(SHIFT ,f
                (do ((,i ,from (+ ,i ,step)))
                    ((> ,i ,to) #f)
                  (,f ,i)))))

    (RESET (begin (display (factorial (range 1 3 16))) (newline)))

Does it? Save the above into a file test.scm, then run it:

    % haskeem test.scm
    1
    24
    5040
    3628800
    6227020800
    20922789888000

Looks pretty good to me! I will look at some of the other examples he posted; but not tonight.


August 5, 2009

The action is beginning... the native versions of reset et al are beginning to do not-quite-trivial things. It's not quite to the point where I'm going to bother updating the tarball, but the first few tests are starting to work:

    should be 6             6
    should be (1 2)         (1 2)
    should be 4             4
    should be (1 3)         (1 3)
    ...

I think the current changes largely take care of what's needed for the reset etc operators; the shift etc operators only have the trivial part done, ie, I'm not yet binding the code between reset and shift to a callable function. Still, even those first couple of examples demonstrate skipping over that enclosed part when (shift ...) returns.


August 3, 2009

And so it begins... I'm starting on transferring the stuff in delcont.scm into haskell, mostly inside evaluator.hs. This is a version bump, to 0.7.10, but without much actual action: so far I've just added code to recognize the continuation operators and transform them into an internal form, which I'll need in order to be able to track them. For the moment, the stuff I'm putting into haskeem is using the UPPERCASE versions of all the continuations operators; I don't want to totally kill all of the existing continuations stuff in delcont.scm and deltest.scm. So for example

    '(RESET (+ 2 (SHIFT k (+ 3 (k 5) (k 1)))))

gets turned into

    (ext cont  cont.1 (+ 2 (int cont  cont.0 #t #t k (+ 3 (k 5) (k 1)))))

where 'ext cont', 'int cont', ' cont.0', and ' cont.1' each are single symbols with an embedded space; so the user cannot enter these directly, but must go through the external operators. But if you try to evaluate the expression, haskeem will barf, saying this is not implemented yet.

In terms of the table I wrote two weeks ago, the first part of which, discussing reset/prompt, is reposted here:

I've implemented the first item, plus a bit more: that's what those cont.N are for. The other two items should be very straightforward, and that should take care of the first part of this.

As usual, stay tuned...


August 1, 2009

I continue to think there's some deep magic working here... I made a couple of fairly trivial tweaks to the reset and int-s/c (the internal translation of shift etc) macros in delcont.scm. One of these had the effect of making a failing test more nearly correct (I'll describe in a second), which I was not expecting; the others, which should have been straightforwardly correct according to my understanding, caused problems. So I continue to be in the situation of having written a shortish piece of code which I don't understand, but it's correct for almost all of my test cases (and all tweaks to it that I can think of make it clearly less correct: more tests fail).

At this point there are three classes of bugs in these macros: first, and in some sense basically trivial, is that they don't capture the point at which the code between reset and int-s/c dives into the int-s/c: that code is captured into a function and made available as a continuation inside the code inside int-s/c, but it gets started from the beginning each time it gets called. The effect of this bug is that the expression

    (reset (write-string "1 ")
           (shift c (c 'ignore) (write-string "2 ") (c 'ignore))
           (write-string "3 "))

which should print 1 3 2 3, instead prints 1 1 3 2 1 3. This is entirely unsurprising, as I've made no attempt to capture this information, and I'm not quite sure I can with this macro-based approach. I hope to fix this when I translate this into haskell and make these operators "native"... we'll see.

Second is a class of bug where information escapes out of the outermost reset: for example, in the expression

     (display (reset (* 5 (
                           (reset (* 2
                                     (shift f2 (lambda ()
                                                 (* 3 (shift f3 7))))))))))

the result should be 7, but haskeem prints something like ( symbol.2517 7) (the number part of the symbol.XXX will vary). This is an exception being raised, specifically the one inside int-s/c, to return the result to the appropriate reset; but that reset is no longer there. I believe this is due to the (lambda () ...): it's a thunk which carries the correct result along, and eventually it's processed and the result is returned; but that processing happens outside the last reset, and the exception is not caught. The value 7 is correct, though, so I regard this as a minor infelicity, and I think I can fix it.

The third type of bug... is a total mystery to me. This example is taken from a paper by Shan; haskeem passes all the other tests he provides, but not this one. This is supposed to produce (); instead, it produces (a). I have no idea why.

    (display
     (prompt0
      (prompt0 (cons 'a (prompt0
                         (let ((y (control0 f (control0 g (cons 'b (f '()))))))
                           (control0 h y)))))))

Other changes to the macros in delcont.scm include allowing them to have multiple expressions to be evaluated. This is visible in the first bug-case above. This is rather trivial syntactic sugar; it would in all cases be equivalent to just wrap multiple expressions inside a (begin ...). Again, no version bump, this is all just tweaks to scheme code.


July 29, 2009

I'm starting to think I've caught a piece of the serious wild magic here... I wrote this delimited-continuations stuff less than two weeks ago, it's only about a single (ok, longish) page of code, of which I think I understand pretty much every piece, and yet... I have no clue how it's doing some of the stuff it's doing! Here's an example that I took from a posting by the famous Oleg to the comp.lang.scheme newsgroup on 17 December 2003. This example I actually pretty much understand. I massaged it lightly due to the current restriction that the shift be directly visible from the reset (I think I can actually remove this restriction, but I haven't worked out all the details yet).

    (define (my-fact n)
      (if (<= n 0)
          1
          (* n (my-fact (- n 1)))))
    
    (define from 1)
    (define to 21)
    (define step 3)
    
    (reset (begin (display (my-fact 
                            (shift f
                                   (do ((i from (+ i step)))
                                       ((> i to) #f)
                                     (write-string (number->string i) "! =\t")
                                     (f i)))))
                  (newline)))

The definition of (my-fact) is just to show that I've got nothing up my sleeve, this is a perfectly cromulent function that takes as its argument a single numeric value, and yet in the last part of this it appears that I'm passing some sort of weird (shift ...) argument to it, that's causing it to get run multiple times. Not so... this is the inversion of control that's so often mentioned when discussing continuations. This prints out the following table:

    1! =    1
    4! =    24
    7! =    5040
    10! =   3628800
    13! =   6227020800
    16! =   20922789888000
    19! =   121645100408832000

which is indeed a table of the factorials of the given values...

This particular middle-aged plains ape thinks this is seriously cool... you may feel free to disagree, of course! :-)

The latest version of delcont.scm has renamed the outer operator back to reset; I've also added aliases to define the prompt etc operators. Most of the tests in the test suite are passing, with the exception of the known issue with re-running the captured stuff from its beginning rather than from the point where the shift got started. This is incidentally the reason why mzscheme didn't care when I changed the value of the flag in the tests I described earlier: it truly captures the stuff between reset and shift as an already-partially-evaluated expression, so it doesn't re-check what the present value of flag is (unless that gets used after the shift expression; I'll try to make up an example showing that).


July 26, 2009

Yes: the allegro ma non troppo was indeed more or less the right approach. I rewrote the reset macro (it's actually named reset2 in the current version of delcont.scm, just so that I can compare the old and new versions) the way I outlined below, and now it works rather better. I had to do a couple of additional things to make it work, and there is one small area where the old macro worked right and the new macro doesn't; but I understand why.

The additional stuff I had to do, besides what I described a couple of days ago, was that, in processing a shift operator, several pieces get re-wrapped in a new instance of reset, so it's necessary to re-process those pieces to refer to the new reset mark rather than the old one. Without that, it gets itself into an infinite loop... not so nice

The stuff that's newly-broken has to do with the more-dynamic nature of the new reset2 macro: it marks the subsidiary shift operators, then starts evaluating the thunk that was handed to it. If or when evaluation hits a shift operator, that operator takes care of doing the delcont transformation, turning the code between reset2 and the appropriate shift into a function and making it available to the code inside that shift. OK... but then when the inner code calls that function, it starts over from the beginning, whereas properly it needs to start at the point where it entered the shift. Thus, for example, the following:

    (reset (begin (write-string "1 ")
                  (shift c (begin (c 'ignore)
                                  (write-string "2 ")))
                  (write-string "3 ")))
    (newline)

should print 1 3 2 according to mzscheme; but, with the reset2 macro, it prints 1 1 3 2. That's because it starts to evaluate the stuff inside the reset2, prints the "1", then enters the shift: there, it calls the continuation, which says to do the outer code; but it doesn't know to re-start that outer code in the middle. So it prints the "1 " again. I don't know how to fix that in this prototype version, but I think I can live with it; if I can translate this into haskell and fix it there, that'll be fine. (However, although I'm sure I can translate what I've got into haskell, I'm not quite sure I know how to fix this particular issue there... have to see.)

I've added the new version of delcont.scm to the tarball and separately also to the web page; since I added a small new primitive, string->symbol, I bumped the version to 0.7.9. There are some more tests in the file deltest.scm, which is in the tarball but not separately on the website. At the moment, I believe reset2 is passing all of the tests, with the exception of the issue above. That includes a couple of tests using control rather than shift; I haven't yet figured out what the difference is between those and the 0 versions.


July 24, 2009

Going through the previous example in painstaking slow motion, I think it's starting to get clear in my head what is going on, and what I need to do. Let's go through it from the top, allegro ma non troppo. First, the basic transform we're trying to get: from CC Shan, Shift to control, we want

    M[(reset V)] -> M[V]

    M[(reset C [(shift f E)])] -> M[(reset E')]
      where E' = E{f -> (lambda (x) (reset C[x]))}

or, in more scheme-y notation, which I've taken from schemewiki.org

    (reset E) without (shift)
    -> E

    (reset (...A... (shift K E) ...B...))
    -> (let ((K (lambda (x) (reset (...A... x ...B...)))))
         (reset E))

The basic expression we're evaluating, with flag defined as #f, since that's the more interesting case, is

    (display (cons 1 (reset (cons 2 (if flag
                                        (shift k1 (cons 3 '()))
                                        (shift k2 (cons 4 (k2 (cons 5 '())))))))))
    (newline)

which, according to mzscheme, ought to evaluate to (1 4 2 5). So we apply the above transformation to the second shift, because that's the one selected by the flag setting:

    (display
      (cons 1 (let ((k2 (lambda (v2)
                          (reset (cons 2 (if flag (shift k1 (cons 3 '())) v2))))))
                (reset (cons 4 (k2 (cons 5 '())))))))
    (newline)

We also need to process the newly-introduced reset expression inside the (lambda (v2) ...): the flag is false, so just remove the true branch and replace it with #f: we get

    (display
      (cons 1 (let ((k2 (lambda (v2) (reset (cons 2 (if flag #f v2))))))
                (reset (cons 4 (k2 (cons 5 '())))))))
    (newline)

In haskeem, the three expressions evaluate to (1 3), (1 4 3), and (1 4 2 5), respectively. So it looks like maybe there's hope...

So how to achieve this automatically? I think it can actually be done, it's just that the present reset macro is not doing quite the right thing. Here's what I think I need:

I think I have all the pieces needed to do all of that, it's just a matter of putting them together. I can more or less see how to do it either at the haskell level or the scheme level... have to think about which one to use.

I'm not sure that will fix all the issues, and it's not a complete implementation of delimited continuations. One major piece that would be missing would be that so far I don't see a way to do this with the reset and the shift forms spread out over multiple routines: as far as I can see at the present time, they would have to be in one block of code. That's a significant limitation, but even so, this should already be a good step forward.

One question that's still very much unanswered in my mind is what is supposed to happen if, in the above example, the code in the branch that's taken changes the value of flag. In the above example, this would cause a second instance of reset/shift to get activated after the first transformation, thereby introducing a 3 somewhere into the final answer (I haven't figured out yet where I think that ought to go). But according to mzscheme, it doesn't make any difference: the result stays (1 4 2 5). I still need to think about that.


July 23, 2009

The lack of updates in the past two weeks is an accurate reflection of the lack of progress in implementing continuations, either delimited or not. I've improved the reset macro a little bit, it can handle some more cases than before, but I'm still missing some detail of how to do the expansion. I've compared results for a bunch of tests against mzscheme, and there are some annoying mismatches. For example:

    (define (doit flag)
      (display
        (cons 1 (reset (cons 2 (if flag
                                   (shift k1 (cons 3 '()))
                                   (shift k2 (cons 4 (k2 (cons 5 '())))))))))
      (newline))

    (doit #t) should be (1 3)
    (doit #f) should be (1 4 2 5)

The #t case is kind of trivial, since the expression inside the shift doesn't involve the continuation k1. But the #f case is less trivial, and the current reset macro gets that wrong: it prints (1 3) for both cases. I think I sort of understand why, but I'm not quite sure what to do about it... the issue is that the macro is doing a purely syntactic expansion of the expression, first-come-first-serve as it were, and so it always expands the textually-first shift instead of whichever one gets reached at runtime.

I think in order to handle this (more nearly) properly I need to generate some kind of cond statement, with a selector saying which sub-expression got hit and thus which one should get evaluated. I'm not sure that'll catch all cases, but it ought to get a few more than the present version... stay tuned.


July 11, 2009

In reading more about continuations, I came across something called delimited or composable continuations — continuations which don't represent all of the future of a computation, only of a part of it. I found this site which gave some very simple examples and which described the required transformation in a fairly simple way. Part of the trouble I'm having with regular continuations is that it is as close as I think I've come in anything computational to irreducible complexity: because a continuation represents the entire future of a computation (and for an interpreter that's pretty damn wide-open), it's really not so simple to just do a little bit of work in one section of a program. The CPS transformation really has to be applied to an entire program, top to bottom, and the point at which one is calling call/cc is in some sense at the bottom of that tree; so it's necessary to transform a tree while sitting on one leaf of it... (tr)icky.

With (at least the reset/shift flavor of) delimited continuations, one is in the very much happier situation of having access to the top of the tree which needs to be transformed: only when (reset expr) shows up in the code is it necessary to do any work, and that work only involves expr. That's a top-down code-transforming situation, very easy by comparison.

I've implemented a small routine which does that; it's available here (also in the tarball). Here are some fairly trivial examples, taken from the schemewiki site above; the last two are mine, and show multiple reset/shift in the same expression... I think those last results are correct.

    ; from wiki
    lisp> (+ 1 (reset (+ 2 3)))
    6
    lisp> (cons 1 (reset (cons 2 '())))
    (1 2)
    lisp> (+ 1 (reset (+ 2 (shift k 3))))
    4
    lisp> (cons 1 (reset (cons 2 (shift k (cons 3 '())))))
    (1 3)
    lisp> (+ 1 (reset (+ 2 (shift k (+ 3 (k 4))))))
    10
    lisp> (+ 1 (reset (+ 2 (shift k (+ 3 (k 5) (k 1))))))
    14
    lisp> (cons 1 (reset (cons 2 (shift k (cons 3 (k (cons 4 '())))))))
    (1 3 2 4)
    lisp> (cons 1 (reset (cons 2 (shift k (cons 3 (k (k (cons 4 '()))))))))
    (1 3 2 2 4)

    ; not from wiki
    lisp> (+ 1 (reset (+ 2 (shift k1 (+ 3 (k1 5) (k1 1) (reset (* 10 (shift k2 (+ 3 (k2 4))))))))))
    57
    lisp> (+ 1 (reset (+ 2 (shift k1 (+ 3 (k1 5) (k1 1) (reset (* 10 (shift k2 (+ (k1 3) (k2 4))))))))))
    59

I'm going to look into emulating call/cc with these. Since call/cc is unlimited continuations, that's not strictly possible; but I think it's a reasonable start to do call/cc that is limited to one expression typed at the interpreter prompt, and that ought to be doable with reset/shift. Stay tuned...

Again no version bump, this is merely scheme code, available in the file delcont.scm which is now in the tarball.


July 10, 2009

So, a couple of days ago, I wrote a (call/cc1) function which, I thought, should be adequate to do a non-deterministic (amb) operator, or something like it. That supposition turns out to have been largely correct. Here's a fairly trivial program, very slightly re-written from Paul Graham's On Lisp:

    (define (parlor-trick n)
      (let-choice j '(1 2 3 4 5)
        (let-choice i '(1 2 3 4 5)
          (require (= (+ i j) n))
          (list i j))))

and a couple of results from running it:

    lisp> (parlor-trick 4)
    (3 1)
    lisp> (parlor-trick 6)
    (5 1)
    lisp> (parlor-trick 10)
    (5 5)
    lisp> (parlor-trick 11)
    user exception "All out of choices!"

So how does that work? Here's the code that implements the plumbing in the above:

    (define *choice-stack* (make-stack))

    (define (bad-choice)
      (if (*choice-stack* 'empty?)
      (raise "All out of choices!")
      (let ((cur (*choice-stack* 'pop)))
        ((car cur) (cdr cur)))))

    (defmacro (let-choice sym choices . body)
      (let ((ret-fail (gensym))
            (ret (gensym))
            (cont (gensym))
            (chs (gensym))
            (loop (gensym)))
        `(letrec* ((,ret-fail (gensym))
                   (,loop (lambda (,chs)
                            (if (null? ,chs)
                                (bad-choice)
                                (let* ((,sym (car ,chs))
                                       (,ret (let/cc1 ,cont
                                               (*choice-stack* 'push
                                                 (cons ,cont ,ret-fail))
                                               ,@body)))
                                  (if (eqv? ,ret ,ret-fail)
                                      (,loop (cdr ,chs))
                                      (begin (*choice-stack* 'pop)
                                             ,ret)))))))
                  (,loop ,choices))))

    (define (require pred?) (if (not pred?) (bad-choice)))

All of the heavy lifting is in the (let-choice) macro. This is designed to obey the restrictions due to the (call/cc1) and (let/cc1) stuff which I wrote a couple of days ago. As far as I can tell, it should work fine if that's replaced by regular continuations (but I haven't tried that).

I decided to make a distinct failure symbol (the (ret-fail (gensym)) stuff) for each (let-choice). I'm not quite sure that I need that generality, it would have worked just as well if I had just defined one global *choice-fail* symbol and tested against that; and that would have made the (bad-choice) function slightly simpler, too. Maybe I'll change it.

One thing that's a little trickier with this version is getting not just one but all solutions to the given problem: because these continuations expire after the scope of the call/cc1 or let/cc1 is left, it's a little harder to tell the code "ok, this solution is nice, now fail again and get me to the next one". There are two ways to approach that: the simpler one is to just explicitly write the body of what gets stuck into the innermost (let-choice) to handle this possibility. The second possibility, much trickier, would be to write a hypothetical (all-solutions) macro that reaches into the innards of the nested (let-choice) macros and rewrites (actually just wraps) the body of the innermost one to do the additional processing needed. With high-level syntax-rules macros, that might be feasible; with the low-level (defmacro) stuff I've got, I'm not going to try...

No version bump, this is merely additional scheme code, available in the file choice.scm which is now in the tarball.


July 9, 2009

A small change of pace: I hacked in a couple of new primitives, to deal with running sub-processes: (run-command), (run-read-command), and (run-write-command). I'm starting to use haskeem more and more for simple shell-scripting stuff, and the lack of these functions was starting to bother me a little. (run-command foo) is the equivalent of the C system function, it just runs the given command without trying to interact with it. (run-read-command foo-cmd [bar-dir]) and (run-write-command foo-cmd [bar-dir]) are the equivalent of C popen: they return a port object which can be read from or written to, respectively. Here's a small & silly sample session:

    lisp> (define port (run-write-command "sed 's/yow/YOW!/g' > foo.dat"))
    {IO port}				<- actually prints with angle brackets, but HTML validator barfs on that...
    lisp> (write-string port "now is the yow time\n")
    #t
    lisp> (write-string port "for all yow men and true\n")
    #t
    lisp> (write-string port "to come to the yow aid\n")
    #t
    lisp> (write-string port "of the yowyowyow party\n")
    #t
    lisp> (close-port port)
    #t
    lisp> (run-command "cat foo.dat")
    now is the YOW! time
    for all YOW! men and true
    to come to the YOW! aid
    of the YOW!YOW!YOW! party
    0					<- return value, not in the file
    lisp>

I may well mess around with the interface some, but it's a start at what I want. Very likely I will add a second optional argument to (run-command), analogous to the other two, to specify the working directory.


July 8, 2009

Continuations... I'm still studying & thinking about how to fully implement these. But it occurred to me that I might already be able to do a slightly-restricted version of them, namely escapers. Studying these in the continuations chapter in PLAI, I remembered that I have the necessary ingredients, namely (raise) which can throw arbitrary scheme objects, (guard) which can catch them, and (gensym) to disambiguate things. With those capabilities, it's a trivial matter to write a (call/cc1) function and a (let/cc1) macro:

    (define (call/cc1 func)
      (let* ((sym (gensym))
             (cont (lambda (ret) (raise (cons sym ret)))))
        (guard
         (err ((and (pair? err) (eqv? sym (car err)))
               (cdr err)))
         (func cont))))

    (defmacro (let/cc1 k . body)
      `(call/cc1 (lambda (,k) ,@body)))

These implement what might be called "poor man's" continuations in that they have a couple of significant limitations: first, the continuations are one-shot, ie, they can't be invoked more than once; second, they have a limited lifetime, ie, we can't save the continuation inside the user function, return from the user function, and then later re-invoke the continuation — the re-invoking will work just fine, but it won't return to where we'd want it to go; and third, if we nest these and then invoke an outer one, any inner ones will go stale. (I haven't tested this last, but that's how (guard) is supposed to work.)

Still, this suffices for at least some of the examples in PLAI:

    (define (f1 n)
      (+ 10 (* 5 (/ 1 n))))

    (define (f2 n)
      (+ 10 (* 5 (call/cc1
                  (lambda (k)
                    (/ 1 (if (zero? n)
                             (k 1)
                             n)))))))

    (define (f3 n)
      (+ 10 (* 5 (let/cc1 k
                          (/ 1 (if (zero? n)
                                   (k 1)
                                   n))))))

The first function just divides by whatever we entered; the second and third functions are careful to check if we'd divide by zero, and if so just pass back 1 instead. Here they are in action:

    lisp> (f1 1)
    15
    lisp> (f2 1)
    15
    lisp> (f3 1)
    15

    lisp> (f1 2)
    25/2
    lisp> (f2 2)
    25/2
    lisp> (f3 2)
    25/2

    lisp> (f1 0)
    1/0
    lisp> (f2 0)
    15
    lisp> (f3 0)
    15

That already looks pretty good. I'll see if I can implement something like the (amb) operator... I'm not sure, but I think it ought to be possible. No version bump, but I'm adding callcc.scm which contains the above code to the tarball.


July 4, 2009

Well! Last night I decided to look a bit at memory usage. I had not previously considered this at all, since it was not really one of my goals; but since I'm still trying to bend my mind around continuations, and I'll need greater efficiency if I ever hope to do continuations natively, I decided to have a look. My test program is the following, which just computes Collatz sequences for successive integers.

    (define (collatz n org steps)
      (cond ((= n 1) (write-string (number->string org) #\space
                                   (number->string steps) #\linefeed))
            ((even? n) (collatz (/ n 2) org (+ steps 1)))
            (else (collatz (+ 1 (* 3 n)) org (+ steps 1)))))

    (define (coller n)
      (collatz n n 0)
      (if (< n 25000)
          (coller (+ n 1))))

    (coller 1)
    (display (cputime))
    (newline)

This is pure looping, there is no buildup of control information here, and with proper tail-call optimization this could run forever in constant space (well, leaving aside the fact that eventually the representation of n does grow; but that isn't a concern here). I tried this with mzscheme and verified that it does indeed run in constant memory, about 30 MB as it happens.

So how does haskeem do? Not so hot. I followed the instructions in Real World Haskell (also on the haskell wiki) to compile with profiling active, then did a run and generated the following plot:

Old memory usage

That's kind of a memory pig! But looking closely at the data, it is evident that most of it comes from just a couple of cost centers: in particular, the lastOrNil function inside the evaluator is strongly implicated in all of this. The reason for this function is to get the value of the last of a bunch of scheme forms: if my code is (foo) (bar) (baz), the interpreter is supposed to evaluate all three and return the value of the last one as the value of the whole thing. Since there is also the possibility of an empty list of these, lastOrNil has to also be able to deal with an empty list; that's why I didn't just use last. It's a tiny function:

    lastOrNil = liftM lON
      where lON [] = List []
            lON l = last l

and it gets used in a bunch of places like this: lastOrNil (mapM (evalLisp env ql) body). So I stared at this for a while, thinking that perhaps the list wasn't fully evaluated, and there were large thunks getting built up. I tried to make it stricter, but nothing made any difference. Eventually I figured out that it isn't laziness that's the issue, it's that the list is getting build up at all: all of the results generated by mapM are being kept, only to get tossed an instant later. So I replaced all of the calls to lastOrNil . mapM with calls to a new function mapML which does all of this directly in one go:

    mapML fn lst = mapMLA (List []) fn lst
      where mapMLA r _ [] = return r
            mapMLA ro fn (x:xs) =
               do rn <- fn x
                  mapMLA rn fn xs

The results are rather dramatic:

New memory usage

This is still not fully optimized, there is still growth in the memory usage, but a 20× to 50× reduction in memory is... rather nice. In addition, there turned out to be a 25% reduction in runtime, also nice.

This certainly deserves a version bump, to 0.7.7.


July 3, 2009

Only small tweaks in the past several days. I added a fluid-let macro to the library, plus I'm writing some small procedures to deal with stacks and heaps. In a backward-incompatible change, I renamed new-symbol to gensym, which is closer to standard usage.

I also got tired of the failures in the test suite which involved comparisons and min/max involving rational infinities, so I put in small wrapper functions to deal with these cases.

Another version bump, to 0.7.6.


June 28, 2009

I have continued to read PLAI and to mess with the continuation-passing macros; I think that's coming along ok. In order to make it work the way I think it ought to, I had to make a small change in the evaluator: previously, the expanded syntax of macros was getting evaluated still in the environment that was captured when the macro was defined, and that was messing up the last of the CPM examples below: I was getting "14" instead of "15" which I think is the right answer.

This should print "5"

    (define foo (=lambda (x) (=return (+ 1 x))))

    (display (foo *cont* 4))
    (newline)

Same (foo) as above; this should print "10"

    (let ((*cont* (lambda (x) (* 2 x))))
      (display (foo *cont* 4))
      (newline))

This should print "14": the function says to add 10 to the value passed in, and the current continuation, which is the default identity, is then applied to that value: 4 + 10 → identity → 14

    (=define (bar x) (=return (+ 10 x)))
    (display (bar 4))
    (newline)

This should print "15" ... I think? The function is the same as before, but here the current continuation has been re-jiggered to add 1 to what it receives: 4 + 10 → (+1) → 15.

    (let ((*cont* (lambda (x) (+ 1 x))))
      (display (bar 4))
      (newline))

All of the scheme code for this is in the file cpm.scm, which I've included in the tarball. It's not quite complete yet, there is one more macro that needs defining (PG's =bind macro), but what's there is, I think, correct.

I decided that this stuff warrants a version bump: it's 0.7.5 now.


June 27, 2009

A couple of small cleanups and improvements: I noticed the minor infelicity that if I entered whitespace in front of something evaluatable at the REPL, it got rejected; the parser was being a bit too strict. Oops! I guess I hadn't ever tried that before... fixed now. Also, I unified the treatment of functions, traceable functions, and macros inside the evaluator: now there is only a Func type, which can be traceable or not, and it can be a macro or not. That has the nice side-effect of making macros traceable; a traced macro will produce two lines of trace output when it's hit, one line showing what the arguments are and the other line showing what it expands into; thusly:

    lisp> (assert (= 0 1))
    trace: assert <- ((= 0 1))
       ->  (when (not (= 0 1)) (write-string stderr "assertion failure: ") (display (quote (= 0 1)) stderr) (newline stderr) (raise "assertion failure!"))
    assertion failure: (= 0 1)
    user exception "assertion failure!"
    lisp>

It's not quite macro-expand, but it's close enough that I bumped the version to 0.7.4.


June 21, 2009

Version bump to 0.7.3, I re-enabled control-C catching in the REPL. Other than that, I've mostly been studying PLAI and trying to wrap my brain around continuations (or, more precisely, turn my brain inside-out to match what continuations do).


June 20, 2009

The regexp code I mentioned is coming along ok: I've got stuff implemented to parse a regexp, to turn the parse tree into an NFA, and to make transitions according to the NFA given a start state and an input string. It all works, albeit slowly. Still, since I'm using the Thompson algorithm, there are no exponentially-bad cases: thus I can easily generate cases where this code finishes in seconds to minutes, whereas perl, python, and most of the other big guys would not finish before the heat death of the universe. Look at the routines make-slow-regexp and make-slow-string in the file regexp.scm (also in the tarball). If you use that with n in the range 30 to 36, you'll cause perl to run for minutes to a couple of hours; with n set to 128, it would take right about 1.5×1024 years. In contrast, this code can handle this n with no particular problems, about ten minutes or so. So... just gotta pick your benchmark right! :-)

I recently came across a book, Programming Languages: Application and Interpretation, that looks very interesting; if you, gentle reader, are here out of more than idle curiosity, then you may find that book of interest as well. It is relevant specifically to my little toy because Prof Krishnamurthi has a long chapter on continuations in the book, starting from a level where even a barefoot C programmer like me can follow. So I will be reading with great interest. I expect to follow this path: first, try to manually rewrite some trivial programs in continuation-passing style (CPS). It's always possible to do this, although the resulting program is essentially unreadable (which is probably why it's not done much). Second, I'll continue the continuation-macros approach I mentioned below. At that point, I think I should begin to have a good enough understanding of them to be able to either add continuations to haskeem at the haskell level, or to write a CPS-transformer at the scheme level. I haven't yet decided.


June 16, 2009

I added a bunch of bit-diddling primitives: bits-and, bits-or, bits-xor, bits-shift, bits-set, bits-clear, bits-flip, bits-set?, and bits-get. I have been playing a bit with simple regexp processing in haskeem, and I implemented sets as lists of integers. It works, but is a little painfully slow; I think these bit-wise operators will speed up that stuff considerably, and they are generally useful for bit-diddling. For positive numbers, they all do pretty much what you expect; for negative numbers, they kind of assume a 2s-complement which is one bit larger than the size of the number. I basically just sucked in the functionality in the haskell base library Data.Bits.

I also added a (load-library) function (actually, a macro). If you give it a bare filename, it will look for that file in a list of directories specified by the environment variable HASKEEM_LIBRARY_PATH and (load) it... useful as I'm starting to build up a larger library of scheme files.


June 13, 2009

I added the xkcd image at the top... it seems somehow appropriate.


June 11, 2009

I withdrew when and unless as special forms... now that macros are present, it's just as easy to do them that way, and it reduces the number of special forms as well as making it possible, if desired, to redefine them. I also tried a version of let as a macro. It works fine, including nested applications, but I'm not going to take that one out as a special form.

    (defmacro (my-let bindings . body)
      `((lambda ,(map car bindings) ,@body) ,@(map cadr bindings)))

I haven't added a macro-expand facility yet, but doing the same kind of stuff I played with earlier, to test out defmacro, works pretty well. I'm currently studying Paul Graham's On Lisp, specifically the chapter on continuations, to see if I can do continuation macros approximately the way he does them. That's not quite supporting them "natively" in the language, but... it gets close. The continuation macros that PG lists are, I think, quite similar to Haskell monads, although I haven't looked at them closely enough to completely prove that.


June 6, 2009

Macros... after a fair bit of reading, I've concluded that full R6RS hygienic macros are a little too complicated for me to tackle right now; there are too many subtleties that I don't understand yet. But I found a very lucid explanation of the (seems to me) simpler (defmacro) in Peter Siebel's book Practical Common Lisp. Since I'm not being doctrinaire about this project, I'll try to implement some form of that first, and see where it takes me.

So, what's a macro? Basically, it's a function that accepts and returns unevaluated forms, and that runs at macro expansion time instead of at runtime. Let's break that down a bit. A really simple example is an assert macro. This doesn't particularly show the full implications of the above, but it's a good small start. First, an assert function:

    (define (assert some-cond)
      (when (not some-cond)
             (write-string "assertion failure: ")
             (display some-cond)
             (newline)))

Does it work? Almost... it certainly catches assertion failures, but the error message that it prints is not quite what we'd like to see. If we have

    (define x "foo")
    (assert (eqv? x "bar"))

we do see an assertion failure, but the message that gets printed is:

    assertion failure: #f

instead of what we'd really like to see:

    assertion failure: (eqv? x "bar")

If we use regular functions, as above, there is no possibility of achieving that, because by the time some-cond has arrived inside the assert function, it has already been evaluated to a value, in this case #f. That's where the "unevaluated forms" from above comes in. There aren't macros in haskeem, but we can simulate them. First, a second version of assert that returns unevaluated forms:

    (define (assert-macrolike some-cond)
      `(when (not ,some-cond)
             (write-string "assertion failure: ")
             (display ',some-cond)
             (newline)))

Then we use it: we feed it an unevaluated form, note the quote, and we run it: that's the (define macro-expand ...) line, which is macro expansion time. Finally, we evaluate the result of that, that's the (eval macro-expand) line, and that's runtime:

    (define x "foo")
    (display x)
    (newline)
    (define macro-expand (assert-macrolike '(eqv? x "bar")))
    (display macro-expand)
    (newline)
    (eval macro-expand)

The result of this is

    "foo"
    (when (not (eqv? x "bar")) (write-string "assertion failure: ") (display (quote (eqv? x "bar"))) (newline))
    assertion failure: (eqv? x "bar")

and we're happy!

So... what does it take to turn the above into what we want?

    (defmacro (assert-macro some-cond)
      `(when (not ,some-cond)
             (write-string "assertion failure: ")
             (display ',some-cond)
             (newline)))
    (define x "foo")
    (display x)
    (newline)
    (assert-macro (eqv? x "bar"))

I think it won't take too much work at all. The (defmacro) stuff is completely analogous to (define). I will start by only implementing the variant with a fixed number of arguments, but I think the variable-number variant should be simple as well. The only issue is that the result associated with the name needs to be marked as a macro rather than a function; no big problem there. Then, in the generic function-evaluation code, I need to add a check for macro-ness, and if it comes out true, do the slightly different processing of passing unevaluated forms to the macro, running that (macro-expansion time), and finally running the result (runtime).

Here is a file containing the above code, plus another pseudo-macro: two versions of a 3-way numeric if.

All in all, I think this might be only one or two dozen lines of code. Check back in a couple of days!

<time passes...>

Holy Y combinator, Batman! Under an hour! The important change turned two lines of code into three, plus fiftyish additional lines of pretty trivial cut'n'paste additions to introduce the new datatype Macro, in all respects analogous to Func, and add the new special form defmacro... ¡Ay Caramba!

Here is another sample file containing actual defmacro versions of the above.

Version bump to 0.7.0; now I need to implement a macro-expand special form, to aid in figuring out what any given macro is doing.


May 29, 2009

Ouch! Long delay since the last update here... what with one thing and another, various important life events, no haskeem hacking in a long time. But I have returned to haskeem and have removed a couple of small spots of bit-rot, plus fixed a small bug I noticed just recently.

The bit-rot had to do with the change in ghc from GNU readline to haskeline: the latest version of ghc doesn't use readline, and that's a bit of functionality I am reluctant to live without. So I've updated haskeem to use haskeline. I did lose one small bit of functionality in that change: interrupting a long calculation once again kills the whole interpreter, rather than getting back to the lisp> prompt. Oops! Well, I'll hopefully get that worked out soon.

For people with older versions of ghc or who don't have haskeline installed, I've left the old version of the main file haskeem.hs, where this stuff was changed, in the tarball.

The bug had to do with scoping and variable bindings: for some reason I don't now remember, I had (define) simply update a variable (ie, like set!) if it was already defined in the environment... well, that kinda messes up scoping. Consider this program:

    (define str "I am top-level")

    (define (foo) (write-string "foo: " str #\newline))
    (define (bar)
      (define str "I am internal")
      (write-string "bar: " str #\newline)
      (foo)
      (write-string "bar: " str #\newline))

    (foo)
    (bar)
    (foo)

That ought to print out three "I am top-level" interspersed with two "I am internal"... instead, it was printing out one "I am top-level" followed by four "I am internal". Fixed now...

In real new stuff, I have been thinking about both continuations and macros. I think macros will be straightforward but a lot of work, and I expect I'll start doing those. Continuations, ie, call/cc, are not so straightforward, but I think I am starting to understand them. I do not now remember where I saw this, but I do know that it was a comment left by Audrey Tang in response to an article on continuations; and it's worth repeating. In quasi-perl, here's a capsule summary of what continuations are:

    $continuation = \&return;

Not so difficult, really. However, I don't think I can implement them in haskeem; I think the flow of control at the haskell level where each expression is evaluated is not accessible enough for me to be able to wrap it up and make it available at the scheme level inside haskeem; or rather, reworking the innards to make the flow of control explicit enough would basically come to a complete rewrite of the whole interpreter. Bummer!


March 5, 2008

Not so much haskell hacking in the last few days. I did finish the nested quasiquote etc stuff. As expected, it was straightforward but a little tedious. Now it returns all the proper results for the example cases from R6RS.

I have tweaked the powerseries examples a little bit. I verified the Bessel functions, added Bessel functions of the second kind In, and simplified the (ps-recip) function: it had previously needed three terms where it ought to have needed only two. After a bit of looking, I found the problem: it wasn't in (ps-recip) at all, but in (ps-mul): I wasn't being lazy enough in that routine.


February 24, 2008

Only a little bit of hacking today: I added a (new-symbol) special form; I'm starting to think about doing macros, and I think that'll be necessary. I avoid name clashes with user-defined symbols by simply generating the internal symbol with a leading space: the rest of the system is perfectly happy with such a symbol, but the parser won't pass the space; so the user can't accidentally generate such a symbol. Clashes between multiple internally-generated symbols are avoided by simply sticking a global counter into the symbol name: every time a new internal symbol is manufactured, the counter gets incremented.

I added a non-gnu (and non-editing) version of readline, for those machines which don't happen to have gnu readline installed. It's commented out inside haskeem.hs.

I think I added the necessary bits to the Bessel function power series in powerseries.scm; at any rate, it correctly computes the first (non-trivial) zero of the first five Bessel functions.


February 23, 2008

Again, not too much hacking of the interpreter; more scheme hacking the past few days... that is a problem with writing an interpreter, you then want to use it! :-) With a few hints from Bulat Ziganshin on the haskell-cafe mailing list, I did add the ability to catch keyboard interrupts; so it's possible to type control-C at a long-running calculation and get back to the lisp> prompt. Previously, that would have just killed the interpreter, which is kinda nasty. Thanks, Bulat!

The scheme hacking consisted mainly of some playing with power series. A few days ago, I was re-reading Higher-Order Perl by Mark Jason Dominus, and I decided to re-implement his section 6.7 on power series. The code is here, if you want to play with it; be aware that the Bessel function series is incomplete. The equivalent of his last example, the head of the power series for tan(x), is

    lisp> (stream-head ps-tan 10)
    (0 1 0 1/3 0 2/15 0 17/315 0 62/2835)

February 20, 2008

Not much hacking of the interpreter, just adding several functions to the library: random-number generation, plus networking: I can now write a TCP client and server in haskeem. Here's a trivial server:

    (define listener (listen-on 2500))
    (write-string stderr "listening on port 2500\n")
    (let ((remote #f)
          (host "")
          (port -1)
          (ret '()))
      (do ((i 1 (+ i 1)))
          ((> i 10) (close-port listener))
        (set! ret (accept listener))
        (set! remote (car ret))
        (set! host (cadr ret))
        (set! port (caddr ret))
        (write-string stderr "got connection from " host #\linefeed)
        (write-string remote "hello, " host ", how are you?\n")
        (write-string remote "The current local time is " (localtime) #\linefeed)
        (close-port remote)))

And a trivial client to go with it:

    (define host "localhost")
    (define port 2500)
    (when (positive? (length args))
          (set! host (car args))
          (set! args (cdr args)))
    (when (positive? (length args))
          (set! port (string->number (car args)))
          (set! args (cdr args)))
    (define remote (connect-to host port))
    (do ((spew (string-join-by " "
                           "Connecting to host" host
                           "port" (number->string port))))
        ((not spew) (write-string "all done!\n") (close-port remote))
      (write-string spew #\linefeed)
      (set! spew (read-line remote)))

A super-high-performance web server is probably not in the cards, but I think I could probably build a small web spider on top of this functionality... :-)


February 17, 2008

The quasiquoting unquoting operators are mostly working. I have implemented the unwrapping I had left yesterday, and haskeem is passing most of the test cases shown in R6RS. There are two areas where it is not yet working: neither nested quasiquotes nor the caboose of a dotted-list are properly handled yet. I think both of these are straightforward but a bit involved: doing nested quasiquotes is simply a matter of keeping track of the nesting, but that involves threading a nesting count through the whole machinery of the evaluator, and that touches a fair amount of code. Doing dotted-lists right is, I think, analogous to properly handling them in the parser, and I've already got that working. It's made a little tricky by JTang's original design decision, which I adopted as well, of using Haskell lists as the basic unit of list structure:

    (a . (b . (c . (d . ()))))

is exactly and explicitly equivalent, according to R6RS, to

    (a b c d)

as are all of these (plus a couple more):

    (a b . (c d))
    (a b . (c . (d . ())))
    (a b c . (d . ()))

and that equivalence is not entirely natural when using Haskell lists (indeed, the parser presented in JTang's tutorial doesn't handle this correctly). So there is a bit of work required to correctly handle dotted-lists.

time passes...

The dotted-list problem is solved: indeed, it turned out not to be analogous to the stuff in the parser, it turned out to be the parser. If you examine the examples above, you'll see there is a simple rule for how to turn dotted-lists into lists: if the thing in the caboose position is itself a proper list, then the overall dotted-list object is actually a proper list. That rule, applied repeatedly, is sufficient to turn all of the above examples into the "canonical" list (a b c d), and there is (was) code in the parser to do just that.

However... in the presence of unquote, that comes out wrong. The test case from R6RS which checks this is

    `((foo ,(- 10 3)) ,@(cdr '(c)) . ,(car '(cons)))

which turns into

    (quasiquote ((foo (unquote (- 10 3))) (unquote-splicing (cdr (quote (c)))) . (unquote (car (quote (cons))))))

Examine the caboose: it's (unquote (car (quote (cons))))... which is a two-element proper list; so the parser promotes the overall expression to a proper list, the unquote is in the middle of the list by itself, and then it all falls apart in the evaluator.

So I have added a bit of special-purpose pattern matching to the code in the parser that checks for lists in the caboose: if it's a list, but the first item in the list is a symbol, and that symbol is either unquote or unquote-splicing, then the promotion gets skipped. And then the test succeeds!


February 16, 2008

I've added support for vectors: #(...). At first I was going to use the Haskell Data.Array datatype, but documentation for that is somewhere between nonexistent and crappy, and what little I could make out from the sources seemed to suggest that this was more for native datatypes: the equivalent of C's int, double, etc. My LispVal doesn't seem to fit in there. Fortunately, the Data.IntMap datatype seemed very well suited to what I wanted, so that was the route I chose. The R6RS merely says that

Vectors are heterogeneous structures whose elements are indexed by integers. A vector typically occupies less space than a list of the same length, and the average time needed to access a randomly chosen element is typically less for the vector than for the list.

which fits very well with Data.IntMap: that is implemented using PATRICIA trees, and many of the operations take O(min(n,W)) time, where n is the length of the vector and W is the word size of the computer: 32 or 64. I don't know what the constant out front is, and I don't care too much about that; for large vectors, it's bound to be better than O(n) for a list.

I have also begun to implement more of the quoting constructs: (quote foo) was already present; now I'm adding (quasiquote foo), (unquote foo), and (unquote-splicing foo) -- so far, only the easy bits: the abbreviations

’<datum> for (quote <datum>)
‘<datum> for (quasiquote <datum>)
,<datum> for (unquote <datum>)
,@<datum> for (unquote-splicing <datum>)

At least with the browser I'm using right now, the quote characters are very hard to distinguish: ’, the character used in the abbreviation for (quote <datum>), is the right single quotation mark, located on the right of my keyboard, and ‘, the character used in the abbreviation for (quasiquote <datum>), is the left single quotation mark, located on the left of my keyboard. Actually, the keyboard characters should probably be called apostrophe and backtick, respectively.

time passes...

These are almost working: there is still one extra level of parentheses in the result, for example ‘(1 2 ,(+ 3 4)) should return (1 2 7), but actually returns (1 2 (7)). I think I understand why, and what to do about it... but that's for tomorrow.


February 11, 2008

I've fixed the glitch in writing numbers in scientific notation; that was pretty straightforward, although I don't know how efficient my solution is. But efficiency isn't the primary target here...

I've also added a rather specific new type of comment to the parser: it now understands and ignores "#!" lines at the top. So I can have actual directly-executable scheme scripts:

    % ls -lF try
    -rwxr-xr-x 1 uwe users 134 Feb 11 17:05 try*

    % cat try
    #!/home/uwe/tools/haskeem
    (set! args (cons "It's alive! It's ALIIIIIVVVE!" args))
    (write-string (string-join-by " " args) #\linefeed)

    % ./try
    It's alive! It's ALIIIIIVVVE!

    % ./try meow
    It's alive! It's ALIIIIIVVVE! meow

So now, if I need to, I can compute symbolic derivatives in a cron job :-)


February 10, 2008

I have just received an object lesson in just how lazy Haskell is... I was working on implementing printing numbers in scientific notation, n.nnEnn, and when I thought I had that working, I started doing a couple of tests in haskeem. I entered a couple of numbers which worked fine, and then I entered 145e-2 which promptly crashed the interpreter. I looked at the write-number routine and couldn't find anything, so I started sneaking up on the problem: I entered the number in such a way that I wasn't going to print its value at all (define returns the value it assigned to the variable, and the REPL prints that, so that wouldn't work; I had to write (begin (define var 145e-2) #t) to get the value into the interpreter without printing it), then I started querying it: is it an integer? no; is it a rational? no; is it a real? yes; is it infinite? and the interpreter crashed again. After a little while, I tracked the problem to the parser: I allow the parser to return numbers of the form aaaEbb as integers, if there is no decimal point in the "aaa" part and the "bb" part is non-negative: ie, 123e4 is the integer number 1230000. It turned out that I was insufficiently careful in how I had implemented that, and this particular number 145e-2 satisfied the first part of the test, no decimal point in the "aaa" portion, but not the second part, the "bb" portion is non-negative. So the parse wasn't fully evaluated until well after the interpreter had returned from the parser and had already processed several other commands. In fact, the "is it infinite?" check was the first one where the actual value mattered, as opposed to just the format. Wow! I knew Haskell is lazy, but only in the front of my mind, it seems; my reptilian brainstem was still thinking "oh, it's returned from the parse, so it's finished with that" (well, to the extent that my reptilian brainstem thinks about programming at all... :-) )

That's all fixed now, and scientific notation works, with still one small glitch: if a number has more nonzero digits in its tail than the specified precision, rounding occurs, and that rounding can of course propagate up toward the head: so for example 0.9999 gets printed as 1.00 with two digits after the decimal point. If that happens when formatting a number in scientific notation, 0.9999 gets formatted as 9.999e-1 when enough digits of precision are available, but as 10.00e-1 rather than as 1.00e0 when rounding occurs. The value is correct, just the format is slightly off. This is purely my bug, nothing to do with GHC's native number-printing routines at all, and I understand where and why it's happening, I just haven't fixed it yet.


February 9, 2008

I'm in the process of adding an optional precision specifier to (number->string): it does the equivalent of C's %.Nf now, for integers and rationals; I'm still working on the same for floating-point numbers, and I haven't started %.Ne at all. For integers and floating-point numbers, this isn't so hugely necessary (although it's still pretty useful); but if you're doing some big calculation involving rational numbers, say computing the sum of the reciprocals of the first hundred primes, it's very much nicer to know that that result is about 2.106342121472621094 than that it is the ratio of two several-hundred-digit numbers:

9924938317306578104794915348371074960828795860429929737290618944808506634743370796528436363734050330051393733303456260639596308477098909255081265596725817362940507726427713984550226766935610334519674156561744081975868623/
4711930799906184953162487834760260422020574773409675520188634839616415335845034221205289256705544681972439104097777157991804380284218315038719444943990492579030720635990538452312528339864352999310398481791730017201031090

I dunno about you, gentle reader, but that makes my eyeballs bleed...

To compute that, I actually used the exception-handling stuff in kind of a neat fashion. Here's a bit of code: first define an initial table of the lowest couple of primes; we have to have 3 in there, because 2 is kind of an odd prime...

    (define primes '(2 3))

Then define a routine to check for primality. This is the part I think is cool: I could have a moderately complicated test to determine if I can exit the loop, and then another bit of code to determine what result the routine should return, but using a guard to catch some exceptions, and then within the loop just raise an exception with the return value of the test ends up being cleaner; and it doesn't interfere at all with the actual exception of running off the end of the table: that one isn't caught by the guard.

    (define (is-prime? n)
      (guard (err ((boolean? err) err))
             (do ((pl primes (cdr pl))
                  (cur 0))
                 ((null? pl) (raise "exhausted primes list!"))
               (set! cur (car pl))
               (if (= cur n) (raise #t))
               (if (zero? (remainder n cur)) (raise #f))
               (if (> (* cur cur) n) (raise #t)))))

Then this next routine computes the next prime past the end of the table and adds it to the table, until the size of the primes table is at least n.

    (define (grow-primes-table n)
      (do ((j (length primes) (+ j 1)))
          ((>= (length primes) n) primes)
        (do ((i (+ 2 (last primes)) (+ 2 i)))
            ((is-prime? i) (set! primes (append primes (list i)))))))

And finally this is the routine that computes the sum of the reciprocals of the first N primes. It makes use of the fact that the grow-primes-table routine above returns the list of primes after it has finished growing it.

    (define (sum-recprimes n)
      (apply + (map / (list-head (grow-primes-table n) n))))

If you invoke this with

    (number->string (sum-recprimes 100))

you get the eyeball-bleed-inducer above, whereas if you invoke it with

    (number->string (sum-recprimes 100) 10 18)

you get the much nicer 2.106...


February 2, 2008

Major reworking of error handling. I figured out a couple of problems with the error handling, in particular error messages generated in non-interactive mode are now printed. I had that very wrong before: I was using runIOThrows, a wrapper around catchError, inside Control.Exception.try. In hindsight, it's perfectly obvious that if the inner catcher is handling all the errors, then the outer catcher won't have much of anything to do... d'oh! <palm to forehead>

This does not yet address the exception-handling at the scheme level, but I think I understand Haskell's error handling much better now, so hopefully I'll get the scheme-level exception handling working now, too.

time passes...

Yes! new special form guard, with syntax as described in R6RS. Check this out: first define a small function...

    ; the first 'test', "(begin...)", is only to print out the value of
    ; the exception that was thrown: that 'test' always returns #f
    
    (define (tryit test)
      (guard
       (err ((begin (write-string "exception is '")
                    (display err)
                    (write-string "'\n")
                    #f) #t)
            ((and (string? err) (string=? err "meh"))
             (write-string "caught 'meh'\n")
             "I am the 'meh'-catcher")
            ((and (string? err) (string=? err "barf"))
             (write-string "caught 'barf'... yuk!\n")
             -42)
            ((and (string? err) (string=? err "yahoo!"))
             (write-string "a little excitable today, aren't we...\n")
             "US$44.6billion")
            ((eqv? err '(1 2))
             (write-string "does not compute\n")
             "daleks rule! (until their heads blow off)"))
       (raise test)))

and then run some tests with it:

    lisp> (load "guard.scm")
    (lambda (test) ...)
    lisp> (tryit "meh")
    exception is '"meh"'
    caught 'meh'
    "I am the 'meh'-catcher"                        <-- return value
    lisp> (tryit "barf")
    exception is '"barf"'
    caught 'barf'... yuk!
    -42                                             <-- return value
    lisp> (tryit "yahoo!")
    exception is '"yahoo!"'
    a little excitable today, aren't we...
    "US$44.6billion"                                <-- return value
    lisp> (tryit '(1 2))
    exception is '(1 2)'
    does not compute
    "daleks rule! (until their heads blow off)"     <-- return value
    lisp> (tryit '(1 3))
    exception is '(1 3)'
    user exception (1 3)                            <-- NOT caught! this got printed by
                                                           the outermost level, the 'P' in 'REPL'

Notice that we can throw arbitrary lisp objects as exceptions. W00t! Izzat cool, or what? This mechanism will also catch system-level exceptions, ie, "file not found" kinds of errors; however, those will just show up as string values, like "nil.dat: openFile: does not exist (No such file or directory)". So processing those is a little messier, although still doable.


February 1, 2008

Not so much Haskell hacking, but a few small tweaks in stdlib.scm, plus one very nice usability feature: connect the REPL with gnu readline. That makes it very much nicer to use interactively! I'm hereby declaring that it's no longer (almost) Scheme in Haskell, it's just Scheme in Haskell.


January 28, 2008

I have finished the do syntactic form, although it's not yet in the self-tests (by now 800+); but a small separate script using it gives the right results. The exception-handling stuff, which I thought would be straightforward, is proving less so: I apparently don't understand enough Haskell yet, and I am getting various type-mismatch error messages which I haven't yet managed to untangle. Oh well...


January 26, 2008

I have added a bunch of functions for manipulating files and directories, reading the environment, and so on, plus miscellaneous cleanups; for example, it was giving a parser error message if the user just entered a blank line at the prompt. That was correct, as far as the parser goes -- it expects to see at least one scheme expression or object -- but kinda unfriendly. Also, a special form to display the currently-active bindings: dump-bindings.


January 22, 2008

I have added delay and force, which allow the definition of lazily-evaluated infinite datastructures such as streams. For example, define the natural numbers as follows:

    (define natural-numbers
      (letrec ((next
                (lambda (n)
                  (cons n (delay (next (+ n 1)))))))
        (next 1)))

ie, the integer 1 plus a promise to evaluate the rest of the natural numbers if it should become necessary; then it's possible to do

    (define (stream-head strm n)
      (if (<= n 0)
          '()
          (cons (car strm) (stream-head (force (cdr strm)) (- n 1)))))

    (stream-head natural-numbers 20)

and get

    (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20)

Very cool!

On one level, this was pretty trivial: add a new Delay constructor to the LispVal stuff in lispdata.hs, then in evaluator.hs add code to just return a Delay object when a (delay foo) special form is seen, plus code to actually evaluate what was stored if a (force bar) special form is seen. However, that by itself fails one of the tests in R6RS: with just that, it's possible to make two-timing promises that evaluate to different results depending on when they are forced; that's not so nice!

The solution, adding a variable into which to store the computed value and just return that from then on, was straightforward, too, except for making the variable name. It doesn't really matter what that name is, as long as it doesn't clash with anything else, but that "doesn't clash with anything else" was tricky. I tried for a while to use the State monad, but I couldn't get that to work. Eventually, I decided to just stick stuff into the same environment that's passed along when the Delay object was created, using a global variable defined at startup in the top-level environment. That worked fine.


January 19, 2008

I've been interested in the Haskell programming language for a while, but was never really able to get into it. I wrote a couple of tiny toy programs, but that was about it, until I recently came across Jonathan Tang's most excellent Haskell tutorial Write Yourself a Scheme in 48 Hours. I decided that that looked like a lot of fun, so I started on it. By now, I've spent a fair bit more than 48 hours on it, but I've also gone a ways beyond what JTang described. I started during the christmas 2007 holiday, and have, as of late January 2008, spent the idle hours of a couple of weeks at it. It's starting to be a neat little toy, so I'm putting it onto the web here for others to play with or possibly learn from.

Before I go any farther, a couple of points:

OK. With that out of the way, there are a couple of points where I am actually and intentionally critical of what JTang wrote, and where I think he should have done better:

The first is an actual programming issue: the parser JTang described, and the one you will write, should support some kind of comments. R6RS (the Revised6 Report on Scheme) describes a couple of different kinds, of which I have so far implemented only the "semicolon to line-end" flavor, but it is essential, once you write anything longer than a line or two, to be able to comment it. Perhaps you'll only ever write a single comment that says Look and marvel, world, at the brilliance of my lisp koan!... that's ok; but maybe you'll also write things like OMG! WTF is this stupid interpreter doing?! Fix this!!! and then later on you'll be able to find it again and fix it. Whatever. Comments are essential, so when you go through JTang's tutorial, pretend that he talked about comments, and add the ability to comment stuff into the parser early on. It's not hard, and you'll be grateful for it soon enough.

The second issue is less of a programming issue and more being systematic. However much of a toy this is, it is an actual interpreter for an actual language, and you have to test it as you go, but you also have to check that your new hacks don't break anything old. So keep track of small tests, collect them into one large file, and once your interpreter is able to execute a file, set things up to run that self-test; then run it often. This is something that I think JTang should have emphasized more. By now I'm up to 400 tests in the self-test for haskeem. These aren't all totally distinct, a bunch of them are exploring corner cases of simple math functions, but not only does that give me some confidence that it's doing the arithmetic right, it also caused me to find a couple of small bugs in GHC, the Glasgow Haskell Compiler, which I have duly reported. (GHC doesn't handle NaNs quite correctly in all cases, for example 1.0 ** NaN should be NaN but is evaluated as 1.0.)

All right, here's where you should stop if you want to do it yourself. Go write yourself a scheme in 48 hours, and come back when you're done with it!


So what's haskeem? It's (mostly) a sub-set of R6RS Scheme; while writing it, I did retrieve the R6RS, and I went through it, trying to find and eliminate differences. To the extent possible, I did and do want it to be compatible, although I did do a couple of small things differently; I'll describe them more below.

JTang's tutorial implements an interpreter with integer-only arithmetic and only base-10 input and output of numbers, whereas scheme supports integers, rational numbers, and real and complex floating-point numbers; all of these can furthermore come in several different precisions, plus they can be exact or inexact, and the integer quantities can be written in base-2, base-8, base-10, and base-16. I have not implemented all of this! It's important for writing large robust programs; it's not essential for writing a small toy. But I have implemented parts of it: I began by adding support for real floating-point numbers, since I wanted to add a bunch of math functions, and having, for example, (sin x) always return only one of -1, 0, or +1 just doesn't quite cut it; the numerical analyst cringes!

This is the point where I made my one major excursion from R6RS Scheme syntax: that document describes and allows only base-10 floating-point numbers, but when I was adding floating-point numbers to the parser, I found that I was re-using enough pieces of the integer-parsing code, which supports multiple bases, that it would be pretty simple to add support for floating-point numbers in the other bases. So I did. So it's possible to specify the floating-point number #xdeadbeef.c0ffee in haskeem (it's 3.7359285597539053e9), or to see that pi in base-16 is about #x3.243f6a8885a3.

I have also mostly finished adding support for rational numbers. There are some small differences in the way I treat them versus how R6RS treats them. For finite rationals, of course, it's all the same, but if I understand the Report correctly, they are disallowing rational infinity and NaN: +1/0, -1/0, and 0/0, whereas I think these are useful quantities for the same reasons that floating-point infinities and NaNs are useful, so I allowed them and have attempted to deal properly with them in haskeem. That was actually a bit tricky, since Haskell's native rational numbers don't play well with these quantities. Haskell will recognize them and can read and write them, but doing arithmetic with them leads to runtime exceptions, so I had to add a fair bit of special-case code to deal with all the possibilities..

That's the next major area where I added stuff to what JTang showed. Try typing this at his interpreter: (/ 1 0). If it does the same thing on your computer as on mine, it'll give you a "division by zero" runtime exception and dump you back to your shell prompt. Not so good! Another area where I found the same thing was when implementing some of the file-IO stuff. It's pretty user-hostile to signal end-of-file with an exception that kills the interpreter, so adding exception-handling was pretty important. I'm not sure I have this entirely right: although I've wrapped the entire main REPL in a Haskell try, it is still not catching division by zero. (No, Haskell experts, it's not that I'm using try from the prelude, it's Control.Exception.try, and it's not catching divide by zero; I (sic) tried, although in a small test program CE.try does catch divide by zero.) However, haskeem as a whole is catching divide by zero, I just used a different method to do it.

I have also added most of the core R6RS syntactic forms: I've got and, apply, begin, case, cond, define, if, lambda, let, let*, letrec, letrec*, or, quote, and set!. (Incidentally, JTang implemented and and or as functions wrapped around && and ||; this is not quite right, because and and or are supposed to short-circuit as soon as the value of the expression can be determined.) Still missing are let-values, let*-values, do, when and until (although these last two are pretty trivial), and set-car! and set-cdr!; these last two seem to be regarded with some mixed feelings by the R6RS, and I'm not at all sure I'll ever put them into haskeem. In part this is due to a design decision by JTang; more on this in a bit. Also missing is everything related to quasiquote.

I have a mutant version of eval, which actually evaluates an expression in the current function context rather than the top-level context. It didn't occur to me to make it evaluate in the top-level context, that's why I made this a special syntactic form rather than a regular library function. In order to make this work with things like (map eval some-list), I added a function to the standard library: (define (eval x) (eval x)). (If you look, you'll see one of those comments I mentioned next to this definition. :-) )

More related to the actual running of the interpreter, and less related to the scheme language, is the load special form. I am not exactly sure now why I made it a special syntactic form rather than a library function (I should have added a comment there, too!), but it is. Perhaps it was for easy access to the environment; I haven't added any facility for getting the execution environment inside library functions.

Another special form that's related to the specifics of the interpreter is trace, a very modest facility for observing how the interpreter is doing stuff. If you examine the file fibo.scm, you'll see it in action; it produces output like this:

    lisp> (load "fibo.scm")
    trace bad-fibo on
    #t
    lisp> (fibo 6)
    trace fibo-acc on
    trace: fibo-acc <- (6 1 0)
    trace: fibo-acc <- (5 1 1)
    trace: fibo-acc <- (4 2 1)
    trace: fibo-acc <- (3 3 2)
    trace: fibo-acc <- (2 5 3)
    trace: fibo-acc <- (1 8 5)
    8
    lisp> (bad-fibo 6)
    trace: bad-fibo <- (6)
    trace: bad-fibo <- (5)
    trace: bad-fibo <- (4)
    trace: bad-fibo <- (3)
    trace: bad-fibo <- (2)
    trace: bad-fibo <- (1)
    trace: bad-fibo <- (0)
    trace: bad-fibo <- (1)
    trace: bad-fibo <- (2)
    trace: bad-fibo <- (1)
    trace: bad-fibo <- (0)
    trace: bad-fibo <- (3)
    trace: bad-fibo <- (2)
    trace: bad-fibo <- (1)
    trace: bad-fibo <- (0)
    trace: bad-fibo <- (1)
    trace: bad-fibo <- (4)
    trace: bad-fibo <- (3)
    trace: bad-fibo <- (2)
    trace: bad-fibo <- (1)
    trace: bad-fibo <- (0)
    trace: bad-fibo <- (1)
    trace: bad-fibo <- (2)
    trace: bad-fibo <- (1)
    trace: bad-fibo <- (0)
    8

The stuff in parentheses is the argument list being handed to each invocation of each traced function (there can be any number being traced simultaneously). This doesn't yet show indentation to represent who's calling whom, but it is nonetheless already very useful in seeing what's going on. This example very clearly shows the difference in recursive calling between the two versions of the fibonacci calculation. I intend to use haskeem to work my way through as much of Structure and Interpretation of Computer Programs as I can, and I expect that this will help a lot.

So what's missing, what's next? I plan to improve this in a couple more areas, as I have time and ability. I think these are more or less listed in the order in which I'll attack them, and also in order of increasing difficulty:

I'm not so sure that I will implement the rest of the Scheme number tower. Doing complex numbers right is a lot of work that's not really interesting, and I'm not so sure I'll do anything with haskeem that would require it; for those occasions where I would need complex numbers, I can probably do it explicitly as pairs of reals.

If you have questions or comments about the stuff here, please contact me at korg@korgwal.com.

Enjoy! -- Uwe Hollerbach

Page last modified at 2010-01-04 21:40 US/Pacific

Valid HTML 4.01 Transitional