Last update: January 4, 2010
haskeem web site!What's haskeem?
haskeem is pretty much Scheme in Haskell... however, if
you're looking for an industrial-strength Scheme interpreter,
this might not be it! It doesn't implement the full Scheme number
tower, it's not properly tail-recursive (but see below), it doesn't
have full R6RS hygienic macros (but defmacro
is present; see below), it doesn't have call-with-current-continuation
(but, yet again, see below; I'm working on delimited continuations,
reset/shift et al, and they are coming along tolerably well).
If you're a real schemer, you're probably snickering by now... that's
ok! haskeem is a toy that's letting me learn more about both
Scheme and Haskell, it's a lot of fun to work on, it's actually turning into
a useful tool, and all of that's plenty enough for me! I
arbitrarily called the first version I put onto the web "0.5"; I'm now at
0.7.16, which is available here, almost
all under GPL — I put some of the scheme code I've been writing under
BSD3. But see also my comments below on doing it yourself: you'll learn far
more if you go through Jonathan Tang's tutorial
Write
Yourself a Scheme in 48 Hours and write all of this yourself, rather than
just grabbing what I've got here and looking at it.
What does it have?
define, defmacrolambdaquotequasiquote, unquote,
unquote-splicingset!beginand, orif, cond, caselet, let*, letrec,
letrec*delay, forcedoguardreset/shift etc (alpha code, still some
limitations)apply, eval,
load
#(...) including
special forms vector-set!, vector-fill!,
and vector-resize!trace and dump-bindingsHASKEEM_INITreadline; this makes it a
very pleasant working environment.What bugs do I know about? At this point I know of one bug remaining in
stuff that's at least partially implemented: floating-point exponentiation
expt doesn't deal quite correctly with infinities and NaN.
This is tricky to get right in all cases. I think GHC is just following
the C library function pow, which I think doesn't quite get it
right all the time — for example, pow says that
NaN0 should be 1, whereas I think it would be
better to return NaN. There are other similar cases. I've
tried to make haskeem be more conservative than pow,
but I'm not sure I've got all possibilities covered (or, indeed, correct).
There is also a limitation present in the reset/shift
continuations: they redo too much of the captured context. This is an issue
which I've not addressed at all yet.
If you have questions or comments about the stuff here, please contact me at korg@korgwal.com or at uhollerbach@gmail.com.
Enjoy! -- Uwe Hollerbach
January 4, 2010
Ugh, got too busy with other stuff,
and also got stuck on continuations... I'm still stuck on the continuations,
but in the meantime I've cleaned up a huge number of ghc -Wall
warnings and hlint suggestions. There's still a little bit to be
done, but it's starting to get fairly clean...
August 13, 2009
The switch to EEnv
proved very useful, as hoped: today I hacked in the new stuff required
to make shift in a separate function work, and it was quite
simple and straightforward. It is still not working completely, there are
still naming issues to be dealt with, but it's a significant step forward:
the following now works as desired
(define fn (lambda () (shift k (cons 3 (k (k (cons 4 '())))))))
(display (cons 1 (reset (cons 2 (fn)))))
(newline)
should produce (1 3 2 2 4), and indeed it does. What doesn't
yet work is the following:
(define f1 (lambda (v1 v2) (shift k1 (k1 (display v1)) (k1 (display v2)))))
(define f2 (lambda (v1 v2) (shift k2 (k2 (display v1)) (k2 (display v2)))))
(define f3 (lambda (v1 v2) (shift k3 (k3 (display v1)) (k3 (display v2)))))
(reset (f1 1 2) (f2 3 4) (f3 5 6))
(newline)
(define fn (lambda (v1 v2) (shift k (k (display v1)) (k (display v2)))))
(reset (fn 1 2) (fn 3 4) (fn 5 6))
(newline)
Both of these should produce 13564562356456 according to
mzscheme, but only the first does; the second one prints just
12. I think the issue is that the continuation k
is visible too far, so that when the second call to fn becomes
active, it sees the continuation of the first call, and haskeem
gets a little confused. I'll have to do a little digging to figure out what's
going on there.
Those previous examples which don't have this naming confusion now work
fine: the two from the 10th, as well as an example using the
call/cc from the 8th, are all running quite spiffingly,
at least for what I've tried so far.
Version bump to 0.7.15.
August 11, 2009
No changes at all, other than
the transformation of evalLisp etc to take a single
EEnv instead of Env, [[LispVal]],
and Integer. That's enough of a change for today... it even
warrants a version bump, to 0.7.14.
August 10, 2009
Good news, everyone! I have many
more failing test cases for reset/shift! Following the initial
translation into haskell, the next thing I tried was to make some tests with
the shift in another function, not just a macro as in the
factorial list comprehension test; for example:
(define fn (lambda () (shift k 3)))
(display (+ 1 (reset (+ 2 (fn)))))
(newline)
(define fn (lambda () (shift k (+ 3 (k 4)))))
(display (+ 1 (reset (+ 2 (fn)))))
(newline)
The first of these works very well; the second, not so much: it crashes
haskeem quite quickly with a stack space overflow. After adding
debugging stuff to print out some intermediate variable bindings, it became
clear what's going on. Essentially, I was attempting to abuse the scoping
of the interpreter. What I need here is basically dynamic scope, but because
scheme is statically scoped, what I wanted to do won't work. So I have to,
more or less, provide a second environment in which the arguments of the
continuation functions are appropriately bound, and specially mark the
continuation functions so that they, and only they, update this second
environment during processing. Then the shift form can do what
I wanted, which is to check whether its unique symbol is defined in that
environment and use this as the test for what to do next.
This is very tedious, because I basically have to thread (yet) another
argument through 100+ instances of calls to the evalLisp
function. So I'm not going to do that. Instead, I'm going to merge the
existing environment, context-stack, and quote-level arguments into
one new datatype EEnv (for extended
environment) and rewrite all instances
of evalLisp to use this. This will be even more painful
than threading the new argument, but I'll have to do it only once:
after that, I can just stick new stuff into the EEnv, and
it'll be available where I need it. Most instances will just pass it
through unchanged, so that I can just concentrate on the few cases
where something actually needs to be done. This is very much akin to
the State monad, except I'm being a little more explicit.
So: no updates to haskeem for a little while! I've got
a bunch of trivial stuff already done, but changing
the evalLisp function touches many many lines in
evaluator.hs, so it'll take a day or two to get it all right, and then I
will still have to add the new dynamic stuff.
Oh... and this answers the question from Saturday of whether that
call/cc works: the answer is no, because it's implemented
as a separate function.
August 8, 2009
I have changed the names of the
operators inside delcont.scm: I've appended "/m" to all of them, so as
not to clash with the native operators, which I'm now renaming to their
proper lowercase names. For the time being, I've also withdrawn the
reset0/shift0, prompt/control, and
prompt0/control0 operators: I thought some more about how
I had implemented the reset/shift stuff, and it seems to
me that there are some fundamental issues there which make it trickier
than I had thought to re-use that for the other operators. In
particular, the other operators can capture more of their surrounding
context, and I think my code can't do that correctly yet. It may be that
the same issues are present in the macro versions, and that this is
causing some of those tests to fail. I'll have to look into that.
This is another version bump, to 0.7.13, since I did the name changes.
One thing to investigate with this is a slightly-limited version of
call/cc: Queinnec lists the following as an implementation
of call/cc, for programs P which do not use
reset/shift, and with the continuations that are produced
limited to the scope of the reset. If my implementation of
reset/shift can do this properly, that would already be quite
fine.
(let ()
(define (call/cc f)
(shift pc (pc (f (lambda (v)
(shift ignore-pc (pc v)))))))
(reset P))
August 7, 2009
W00t! Native continuations, here
we come! It's still not working quite correctly for all four versions of
the operators, but the reset/shift version is working
correctly for all examples, except for the known issue that
1 3 2 3 comes out as 1 1 3 2 1 3 (plus the other
instances of this bug). Since I've still not looked into this at all,
I don't feel too bad about it. In principle the other pairs of operators are
expressible in terms of reset/shift, so it might be that all
four sets are "done"; but I still want to see if I can get them working
properly "natively".
This definitely warrants an update to the tarball, and a version bump, too: 0.7.12 now.
later... I forgot to check one thing: is it still necessary
to have the reset/shift all in one s-expr? The way I wrote
the native haskell code, it should not be, and so the factorial example
which I stole from the famous Oleg should now work cleanly:
(defmacro (range from step to)
(let ((f (gensym))
(i (gensym)))
`(SHIFT ,f
(do ((,i ,from (+ ,i ,step)))
((> ,i ,to) #f)
(,f ,i)))))
(RESET (begin (display (factorial (range 1 3 16))) (newline)))
Does it? Save the above into a file test.scm, then run it:
% haskeem test.scm
1
24
5040
3628800
6227020800
20922789888000
Looks pretty good to me! I will look at some of the other examples he posted; but not tonight.
August 5, 2009
The action is beginning...
the native versions of reset et al are beginning to do
not-quite-trivial things. It's not quite to the point where I'm going
to bother updating the tarball, but the first few tests are starting
to work:
should be 6 6
should be (1 2) (1 2)
should be 4 4
should be (1 3) (1 3)
...
I think the current changes largely take care of what's needed for the
reset etc operators; the shift etc operators only
have the trivial part done, ie, I'm not yet binding the code between
reset and shift to a callable function. Still,
even those first couple of examples demonstrate skipping over that
enclosed part when (shift ...) returns.
August 3, 2009
And so it begins... I'm starting on
transferring the stuff in delcont.scm into haskell, mostly inside
evaluator.hs. This is a version bump, to 0.7.10, but without much actual
action: so far I've just added code to recognize the continuation operators
and transform them into an internal form, which I'll need in order to be able
to track them. For the moment, the stuff I'm putting into haskeem
is using the UPPERCASE versions of all the continuations operators; I don't
want to totally kill all of the existing continuations stuff in delcont.scm
and deltest.scm. So for example
'(RESET (+ 2 (SHIFT k (+ 3 (k 5) (k 1)))))
gets turned into
(ext cont cont.1 (+ 2 (int cont cont.0 #t #t k (+ 3 (k 5) (k 1)))))
where 'ext cont', 'int cont',
' cont.0', and ' cont.1' each are single symbols
with an embedded space; so the user cannot enter these directly, but must
go through the external operators. But if you try to evaluate the expression,
haskeem will barf, saying this is not implemented yet.
In terms of the table I wrote two weeks ago, the first part of which,
discussing reset/prompt, is reposted here:
shift with
some unique per-shift tag — top-level means only those
that are directly visible to the initiating reset, with no
intervening reset or shiftI've implemented the first item, plus a bit more: that's what those
cont.N are for. The other two items should be very
straightforward, and that should take care of the first part of this.
As usual, stay tuned...
August 1, 2009
I continue to think there's some
deep magic working here... I made a couple of fairly trivial tweaks to the
reset and int-s/c (the internal translation of
shift etc) macros in delcont.scm. One of these had the effect of
making a failing test more nearly correct (I'll describe in a second), which I
was not expecting; the others, which should have been straightforwardly
correct according to my understanding, caused problems. So I continue to
be in the situation of having written a shortish piece of code which I don't
understand, but it's correct for almost all of my test cases (and all tweaks
to it that I can think of make it clearly less correct: more tests fail).
At this point there are three classes of bugs in these macros: first,
and in some sense basically trivial, is that they don't capture the point
at which the code between reset and int-s/c
dives into the int-s/c: that code is captured into a function
and made available as a continuation inside the code inside
int-s/c, but it gets started from the beginning each time it
gets called. The effect of this bug is that the expression
(reset (write-string "1 ")
(shift c (c 'ignore) (write-string "2 ") (c 'ignore))
(write-string "3 "))
which should print 1 3 2 3, instead prints
1 1 3 2 1 3. This is entirely unsurprising, as I've made no
attempt to capture this information, and I'm not quite sure I can with this
macro-based approach. I hope to fix this when I translate this into haskell
and make these operators "native"... we'll see.
Second is a class of bug where information escapes out of the outermost
reset: for example, in the expression
(display (reset (* 5 (
(reset (* 2
(shift f2 (lambda ()
(* 3 (shift f3 7))))))))))
the result should be 7, but haskeem prints something like
( symbol.2517 7) (the number part of the symbol.XXX
will vary). This is an exception being raised, specifically the one
inside int-s/c, to return the result to the appropriate
reset; but that reset is no longer there. I believe
this is due to the (lambda () ...): it's a thunk which carries
the correct result along, and eventually it's processed and the result is
returned; but that processing happens outside the last reset,
and the exception is not caught. The value 7 is correct, though, so I regard
this as a minor infelicity, and I think I can fix it.
The third type of bug... is a total mystery to me. This example is taken
from a paper by Shan; haskeem passes all the other tests he
provides, but not this one. This is supposed to produce ();
instead, it produces (a). I have no idea why.
(display
(prompt0
(prompt0 (cons 'a (prompt0
(let ((y (control0 f (control0 g (cons 'b (f '()))))))
(control0 h y)))))))
Other changes to the macros in delcont.scm include allowing them to have
multiple expressions to be evaluated. This is visible in the first bug-case
above. This is rather trivial syntactic sugar; it would in all cases be
equivalent to just wrap multiple expressions inside a
(begin ...). Again, no version bump, this is all just tweaks
to scheme code.
July 29, 2009
I'm starting to think I've caught a
piece of the serious wild magic here... I wrote this delimited-continuations
stuff less than two weeks ago, it's only about a single (ok, longish) page of
code, of which I think I understand pretty much every piece, and yet... I have
no clue how it's doing some of the stuff it's doing! Here's an example that I
took from a posting by the famous Oleg to the comp.lang.scheme newsgroup on 17
December 2003. This example I actually pretty much understand. I massaged it
lightly due to the current restriction that the shift be directly
visible from the reset (I think I can actually remove this
restriction, but I haven't worked out all the details yet).
(define (my-fact n)
(if (<= n 0)
1
(* n (my-fact (- n 1)))))
(define from 1)
(define to 21)
(define step 3)
(reset (begin (display (my-fact
(shift f
(do ((i from (+ i step)))
((> i to) #f)
(write-string (number->string i) "! =\t")
(f i)))))
(newline)))
The definition of (my-fact) is just to show that I've got
nothing up my sleeve, this is a perfectly cromulent function that takes as
its argument a single numeric value, and yet in the last part of this it
appears that I'm passing some sort of weird (shift ...)
argument to it, that's causing it to get run multiple times. Not so...
this is the inversion of control that's so often mentioned when
discussing continuations. This prints out the following table:
1! = 1
4! = 24
7! = 5040
10! = 3628800
13! = 6227020800
16! = 20922789888000
19! = 121645100408832000
which is indeed a table of the factorials of the given values...
This particular middle-aged plains ape thinks this is seriously cool... you may feel free to disagree, of course! :-)
The latest version of delcont.scm has renamed the outer operator back
to reset; I've also added aliases to define the
prompt etc operators. Most of the tests in the test suite
are passing, with the exception of the known issue with re-running the
captured stuff from its beginning rather than from the point where the
shift got started. This is incidentally the reason why
mzscheme didn't care when I changed the value of the flag
in the tests I described earlier: it truly captures the stuff between
reset and shift as an already-partially-evaluated
expression, so it doesn't re-check what the present value of flag is
(unless that gets used after the shift expression;
I'll try to make up an example showing that).
July 26, 2009
Yes: the allegro ma non
troppo was indeed more or less the right approach. I rewrote the
reset macro (it's actually named reset2 in the
current version of delcont.scm, just so that I can compare the old and new
versions) the way I outlined below, and now it works rather better. I had
to do a couple of additional things to make it work, and there is one small
area where the old macro worked right and the new macro doesn't; but I
understand why.
The additional stuff I had to do, besides what I described a couple of
days ago, was that, in processing a shift operator, several
pieces get re-wrapped in a new instance of reset, so
it's necessary to re-process those pieces to refer to the new
reset mark rather than the old one. Without that, it gets
itself into an infinite loop... not so nice
The stuff that's newly-broken has to do with the more-dynamic nature of
the new reset2 macro: it marks the subsidiary shift
operators, then starts evaluating the thunk that was handed to it. If or when
evaluation hits a shift operator, that operator takes care of
doing the delcont transformation, turning the code between reset2
and the appropriate shift into a function and making it available
to the code inside that shift. OK... but then when the inner
code calls that function, it starts over from the beginning, whereas properly
it needs to start at the point where it entered the shift. Thus,
for example, the following:
(reset (begin (write-string "1 ")
(shift c (begin (c 'ignore)
(write-string "2 ")))
(write-string "3 ")))
(newline)
should print 1 3 2 according to mzscheme;
but, with the reset2 macro, it prints 1 1 3 2.
That's because it starts to evaluate the stuff inside the reset2,
prints the "1", then enters the shift: there, it calls the
continuation, which says to do the outer code; but it doesn't know to re-start
that outer code in the middle. So it prints the "1 " again. I don't know how
to fix that in this prototype version, but I think I can live with it; if I
can translate this into haskell and fix it there, that'll be fine. (However,
although I'm sure I can translate what I've got into haskell, I'm not quite
sure I know how to fix this particular issue there... have to see.)
I've added the new version of delcont.scm to the tarball and separately
also to the web page; since I added a small new primitive,
string->symbol, I bumped the version to 0.7.9. There are some
more tests in the file deltest.scm, which is in the tarball but not separately
on the website. At the moment, I believe reset2 is passing
all of the tests, with the exception of the issue above. That includes
a couple of tests using control rather than shift;
I haven't yet figured out what the difference is between those and the
0 versions.
July 24, 2009
Going through the previous example in painstaking slow motion, I think it's starting to get clear in my head what is going on, and what I need to do. Let's go through it from the top, allegro ma non troppo. First, the basic transform we're trying to get: from CC Shan, Shift to control, we want
M[(reset V)] -> M[V]
M[(reset C [(shift f E)])] -> M[(reset E')]
where E' = E{f -> (lambda (x) (reset C[x]))}
or, in more scheme-y notation, which I've taken from schemewiki.org
(reset E) without (shift)
-> E
(reset (...A... (shift K E) ...B...))
-> (let ((K (lambda (x) (reset (...A... x ...B...)))))
(reset E))
The basic expression we're evaluating, with flag
defined as #f, since that's the more interesting case, is
(display (cons 1 (reset (cons 2 (if flag
(shift k1 (cons 3 '()))
(shift k2 (cons 4 (k2 (cons 5 '())))))))))
(newline)
which, according to mzscheme, ought to evaluate to
(1 4 2 5). So we apply the above transformation to the
second shift, because that's the one selected
by the flag setting:
(display
(cons 1 (let ((k2 (lambda (v2)
(reset (cons 2 (if flag (shift k1 (cons 3 '())) v2))))))
(reset (cons 4 (k2 (cons 5 '())))))))
(newline)
We also need to process the newly-introduced reset expression
inside the (lambda (v2) ...): the flag is false, so just remove
the true branch and replace it with #f: we get
(display
(cons 1 (let ((k2 (lambda (v2) (reset (cons 2 (if flag #f v2))))))
(reset (cons 4 (k2 (cons 5 '())))))))
(newline)
In haskeem, the three expressions evaluate to
(1 3), (1 4 3), and (1 4 2 5),
respectively. So it looks like maybe there's hope...
So how to achieve this automatically? I think it can actually be done,
it's just that the present reset macro is not doing quite
the right thing. Here's what I think I need:
reset, the evaluator must lightly pre-process
the expression being handed to reset:
shift with
some unique per-shift tag — top-level means only those
that are directly visible to the initiating reset, with no
intervening reset or shiftshift is encountered, the value that is computed
is returned as the value of the reset expressionshift is encountered, it's a bit more interesting:
the shift knows the unique tag with which it was marked:
shift in the processed original expressionreset and
shift that was initially begun.shift is encountered and the specific value-returning
exception is raised, then the reset must catch that exception,
unwrap the value that's stored in it, and return that as the value of the
whole expressionI think I have all the pieces needed to do all of that, it's just a matter of putting them together. I can more or less see how to do it either at the haskell level or the scheme level... have to think about which one to use.
I'm not sure that will fix all the issues, and it's not a complete
implementation of delimited continuations. One major piece that would be
missing would be that so far I don't see a way to do this with
the reset and the shift forms spread out over
multiple routines: as far as I can see at the present time, they would
have to be in one block of code. That's a significant limitation, but even
so, this should already be a good step forward.
One question that's still very much unanswered in my mind is what is
supposed to happen if, in the above example, the code in the branch that's
taken changes the value of flag. In the above example, this would cause
a second instance of reset/shift to get activated after the
first transformation, thereby introducing a 3 somewhere into
the final answer (I haven't figured out yet where I think that ought to go).
But according to mzscheme, it doesn't make any difference:
the result stays (1 4 2 5). I still need to think about that.
July 23, 2009
The lack of updates in the past
two weeks is an accurate reflection of the lack of progress in implementing
continuations, either delimited or not. I've improved the reset
macro a little bit, it can handle some more cases than before, but I'm still
missing some detail of how to do the expansion. I've compared results for
a bunch of tests against mzscheme, and there are some annoying
mismatches. For example:
(define (doit flag)
(display
(cons 1 (reset (cons 2 (if flag
(shift k1 (cons 3 '()))
(shift k2 (cons 4 (k2 (cons 5 '())))))))))
(newline))
(doit #t) should be (1 3)
(doit #f) should be (1 4 2 5)
The #t case is kind of trivial, since the expression inside
the shift doesn't involve the continuation k1. But
the #f case is less trivial, and the current reset
macro gets that wrong: it prints (1 3) for both cases. I think
I sort of understand why, but I'm not quite sure what to do about it... the
issue is that the macro is doing a purely syntactic expansion of the
expression, first-come-first-serve as it were, and so it always expands the
textually-first shift instead of whichever one gets reached
at runtime.
I think in order to handle this (more nearly) properly I need to generate
some kind of cond statement, with a selector saying which
sub-expression got hit and thus which one should get evaluated. I'm not sure
that'll catch all cases, but it ought to get a few more than the present
version... stay tuned.
July 11, 2009
In reading more about continuations,
I came across something called delimited or composable
continuations — continuations which don't represent all of the future of
a computation, only of a part of it. I found
this
site which gave some very simple examples and which described the required
transformation in a fairly simple way. Part of the trouble I'm having with
regular continuations is that it is as close as I think I've come in anything
computational to irreducible complexity: because a continuation
represents the entire future of a computation (and for an interpreter that's
pretty damn wide-open), it's really not so simple to just do a little bit of
work in one section of a program. The CPS transformation really has to be
applied to an entire program, top to bottom, and the point at which one is
calling call/cc is in some sense at the bottom of that tree; so
it's necessary to transform a tree while sitting on one leaf of it...
(tr)icky.
With (at least the reset/shift flavor of) delimited
continuations, one is in the very much happier situation of having access to
the top of the tree which needs to be transformed: only when (reset
expr) shows up in the code is it necessary to do any work, and that
work only involves expr. That's a top-down code-transforming
situation, very easy by comparison.
I've implemented a small routine which does that; it's available
here (also in the tarball). Here are some fairly
trivial examples, taken from the schemewiki site above; the last two are
mine, and show multiple reset/shift in the same expression... I
think those last results are correct.
; from wiki
lisp> (+ 1 (reset (+ 2 3)))
6
lisp> (cons 1 (reset (cons 2 '())))
(1 2)
lisp> (+ 1 (reset (+ 2 (shift k 3))))
4
lisp> (cons 1 (reset (cons 2 (shift k (cons 3 '())))))
(1 3)
lisp> (+ 1 (reset (+ 2 (shift k (+ 3 (k 4))))))
10
lisp> (+ 1 (reset (+ 2 (shift k (+ 3 (k 5) (k 1))))))
14
lisp> (cons 1 (reset (cons 2 (shift k (cons 3 (k (cons 4 '())))))))
(1 3 2 4)
lisp> (cons 1 (reset (cons 2 (shift k (cons 3 (k (k (cons 4 '()))))))))
(1 3 2 2 4)
; not from wiki
lisp> (+ 1 (reset (+ 2 (shift k1 (+ 3 (k1 5) (k1 1) (reset (* 10 (shift k2 (+ 3 (k2 4))))))))))
57
lisp> (+ 1 (reset (+ 2 (shift k1 (+ 3 (k1 5) (k1 1) (reset (* 10 (shift k2 (+ (k1 3) (k2 4))))))))))
59
I'm going to look into emulating call/cc with these. Since
call/cc is unlimited continuations, that's not strictly
possible; but I think it's a reasonable start to do call/cc that
is limited to one expression typed at the interpreter prompt, and that ought
to be doable with reset/shift. Stay tuned...
Again no version bump, this is merely scheme code, available in the file delcont.scm which is now in the tarball.
July 10, 2009
So, a couple of days ago, I wrote
a (call/cc1) function which, I thought, should be adequate to
do a non-deterministic (amb) operator, or something like it.
That supposition turns out to have been largely correct. Here's a fairly
trivial program, very slightly re-written from Paul Graham's
On Lisp:
(define (parlor-trick n)
(let-choice j '(1 2 3 4 5)
(let-choice i '(1 2 3 4 5)
(require (= (+ i j) n))
(list i j))))
and a couple of results from running it:
lisp> (parlor-trick 4)
(3 1)
lisp> (parlor-trick 6)
(5 1)
lisp> (parlor-trick 10)
(5 5)
lisp> (parlor-trick 11)
user exception "All out of choices!"
So how does that work? Here's the code that implements the plumbing in the above:
(define *choice-stack* (make-stack))
(define (bad-choice)
(if (*choice-stack* 'empty?)
(raise "All out of choices!")
(let ((cur (*choice-stack* 'pop)))
((car cur) (cdr cur)))))
(defmacro (let-choice sym choices . body)
(let ((ret-fail (gensym))
(ret (gensym))
(cont (gensym))
(chs (gensym))
(loop (gensym)))
`(letrec* ((,ret-fail (gensym))
(,loop (lambda (,chs)
(if (null? ,chs)
(bad-choice)
(let* ((,sym (car ,chs))
(,ret (let/cc1 ,cont
(*choice-stack* 'push
(cons ,cont ,ret-fail))
,@body)))
(if (eqv? ,ret ,ret-fail)
(,loop (cdr ,chs))
(begin (*choice-stack* 'pop)
,ret)))))))
(,loop ,choices))))
(define (require pred?) (if (not pred?) (bad-choice)))
All of the heavy lifting is in the (let-choice) macro. This
is designed to obey the restrictions due to the (call/cc1) and
(let/cc1) stuff which I wrote a couple of days ago. As far as
I can tell, it should work fine if that's replaced by regular continuations
(but I haven't tried that).
I decided to make a distinct failure symbol (the (ret-fail
(gensym)) stuff) for each (let-choice). I'm not quite
sure that I need that generality, it would have worked just as well if I
had just defined one global *choice-fail* symbol and tested
against that; and that would have made the (bad-choice) function
slightly simpler, too. Maybe I'll change it.
One thing that's a little trickier with this version is getting not
just one but all solutions to the given problem: because these
continuations expire after the scope of the call/cc1 or
let/cc1 is left, it's a little harder to tell the code "ok,
this solution is nice, now fail again and get me to the next one". There
are two ways to approach that: the simpler one is to just explicitly write
the body of what gets stuck into the innermost (let-choice)
to handle this possibility. The second possibility, much trickier, would be
to write a hypothetical (all-solutions) macro that reaches into
the innards of the nested (let-choice) macros and rewrites
(actually just wraps) the body of the innermost one to do the
additional processing needed. With high-level syntax-rules
macros, that might be feasible; with the low-level (defmacro)
stuff I've got, I'm not going to try...
No version bump, this is merely additional scheme code, available in the file choice.scm which is now in the tarball.
July 9, 2009
A small change of pace: I hacked
in a couple of new primitives, to deal with running sub-processes:
(run-command), (run-read-command), and
(run-write-command). I'm starting to use haskeem
more and more for simple shell-scripting stuff, and the lack of these
functions was starting to bother me a little. (run-command foo)
is the equivalent of the C system function, it just runs the
given command without trying to interact with it.
(run-read-command foo-cmd [bar-dir]) and
(run-write-command foo-cmd [bar-dir]) are the equivalent of
C popen: they return a port object which can be
read from or written to, respectively. Here's a small & silly sample
session:
lisp> (define port (run-write-command "sed 's/yow/YOW!/g' > foo.dat"))
{IO port} <- actually prints with angle brackets, but HTML validator barfs on that...
lisp> (write-string port "now is the yow time\n")
#t
lisp> (write-string port "for all yow men and true\n")
#t
lisp> (write-string port "to come to the yow aid\n")
#t
lisp> (write-string port "of the yowyowyow party\n")
#t
lisp> (close-port port)
#t
lisp> (run-command "cat foo.dat")
now is the YOW! time
for all YOW! men and true
to come to the YOW! aid
of the YOW!YOW!YOW! party
0 <- return value, not in the file
lisp>
I may well mess around with the interface some, but it's a start at
what I want. Very likely I will add a second optional argument to
(run-command), analogous to the other two, to specify the
working directory.
July 8, 2009
Continuations... I'm still
studying & thinking about how to fully implement these. But it occurred
to me that I might already be able to do a slightly-restricted version of
them, namely escapers. Studying these in the continuations chapter in PLAI, I
remembered that I have the necessary ingredients, namely (raise)
which can throw arbitrary scheme objects, (guard) which can
catch them, and (gensym) to disambiguate things. With those
capabilities, it's a trivial matter to write a (call/cc1)
function and a (let/cc1) macro:
(define (call/cc1 func)
(let* ((sym (gensym))
(cont (lambda (ret) (raise (cons sym ret)))))
(guard
(err ((and (pair? err) (eqv? sym (car err)))
(cdr err)))
(func cont))))
(defmacro (let/cc1 k . body)
`(call/cc1 (lambda (,k) ,@body)))
These implement what might be called "poor man's" continuations in
that they have a couple of significant limitations: first, the
continuations are one-shot, ie, they can't be invoked more
than once; second, they have a limited lifetime, ie, we can't
save the continuation inside the user function, return from the user
function, and then later re-invoke the continuation — the
re-invoking will work just fine, but it won't return to where we'd
want it to go; and third, if we nest these and then invoke an outer
one, any inner ones will go stale. (I haven't tested this
last, but that's how (guard) is supposed to work.)
Still, this suffices for at least some of the examples in PLAI:
(define (f1 n)
(+ 10 (* 5 (/ 1 n))))
(define (f2 n)
(+ 10 (* 5 (call/cc1
(lambda (k)
(/ 1 (if (zero? n)
(k 1)
n)))))))
(define (f3 n)
(+ 10 (* 5 (let/cc1 k
(/ 1 (if (zero? n)
(k 1)
n))))))
The first function just divides by whatever we entered; the second and third functions are careful to check if we'd divide by zero, and if so just pass back 1 instead. Here they are in action:
lisp> (f1 1)
15
lisp> (f2 1)
15
lisp> (f3 1)
15
lisp> (f1 2)
25/2
lisp> (f2 2)
25/2
lisp> (f3 2)
25/2
lisp> (f1 0)
1/0
lisp> (f2 0)
15
lisp> (f3 0)
15
That already looks pretty good. I'll see if I can implement something like
the (amb) operator... I'm not sure, but I think it ought to be
possible. No version bump, but I'm adding callcc.scm which contains the above
code to the tarball.
July 4, 2009
Well! Last night I decided to look a bit at memory usage. I had not previously considered this at all, since it was not really one of my goals; but since I'm still trying to bend my mind around continuations, and I'll need greater efficiency if I ever hope to do continuations natively, I decided to have a look. My test program is the following, which just computes Collatz sequences for successive integers.
(define (collatz n org steps)
(cond ((= n 1) (write-string (number->string org) #\space
(number->string steps) #\linefeed))
((even? n) (collatz (/ n 2) org (+ steps 1)))
(else (collatz (+ 1 (* 3 n)) org (+ steps 1)))))
(define (coller n)
(collatz n n 0)
(if (< n 25000)
(coller (+ n 1))))
(coller 1)
(display (cputime))
(newline)
This is pure looping, there is no buildup of control information here, and
with proper tail-call optimization this could run forever in constant space
(well, leaving aside the fact that eventually the representation of
n does grow; but that isn't a concern here). I tried this with
mzscheme and verified that it does indeed run in constant memory,
about 30 MB as it happens.
So how does haskeem do? Not so hot. I followed the
instructions in Real
World Haskell (also on the
haskell wiki) to compile
with profiling active, then did a run and generated the following plot:

That's kind of a memory pig! But looking closely at the data, it is evident
that most of it comes from just a couple of cost centers: in particular, the
lastOrNil function inside the evaluator is strongly implicated
in all of this. The reason for this function is to get the value of the last
of a bunch of scheme forms: if my code is (foo) (bar) (baz),
the interpreter is supposed to evaluate all three and return the value of the
last one as the value of the whole thing. Since there is also the possibility
of an empty list of these, lastOrNil has to also be able to deal
with an empty list; that's why I didn't just use last. It's a
tiny function:
lastOrNil = liftM lON
where lON [] = List []
lON l = last l
and it gets used in a bunch of places like this: lastOrNil
(mapM (evalLisp env ql) body). So I stared at this for a while,
thinking that perhaps the list wasn't fully evaluated, and there were
large thunks getting built up. I tried to make it stricter, but nothing
made any difference. Eventually I figured out that it isn't laziness that's
the issue, it's that the list is getting build up at all: all of the
results generated by mapM are being kept, only to get tossed
an instant later. So I replaced all of the calls to
lastOrNil . mapM with calls to a new function mapML
which does all of this directly in one go:
mapML fn lst = mapMLA (List []) fn lst
where mapMLA r _ [] = return r
mapMLA ro fn (x:xs) =
do rn <- fn x
mapMLA rn fn xs
The results are rather dramatic:

This is still not fully optimized, there is still growth in the memory usage, but a 20× to 50× reduction in memory is... rather nice. In addition, there turned out to be a 25% reduction in runtime, also nice.
This certainly deserves a version bump, to 0.7.7.
July 3, 2009
Only small tweaks in the past
several days. I added a fluid-let macro to the library, plus
I'm writing some small procedures to deal with stacks and heaps. In a
backward-incompatible change, I renamed new-symbol to
gensym, which is closer to standard usage.
I also got tired of the failures in the test suite which involved comparisons and min/max involving rational infinities, so I put in small wrapper functions to deal with these cases.
Another version bump, to 0.7.6.
June 28, 2009
I have continued to read PLAI and to mess with the continuation-passing macros; I think that's coming along ok. In order to make it work the way I think it ought to, I had to make a small change in the evaluator: previously, the expanded syntax of macros was getting evaluated still in the environment that was captured when the macro was defined, and that was messing up the last of the CPM examples below: I was getting "14" instead of "15" which I think is the right answer.
This should print "5"
(define foo (=lambda (x) (=return (+ 1 x))))
(display (foo *cont* 4))
(newline)
Same (foo) as above; this should print "10"
(let ((*cont* (lambda (x) (* 2 x))))
(display (foo *cont* 4))
(newline))
This should print "14": the function says to add 10 to the value passed in, and the current continuation, which is the default identity, is then applied to that value: 4 + 10 → identity → 14
(=define (bar x) (=return (+ 10 x)))
(display (bar 4))
(newline)
This should print "15" ... I think? The function is the same as before,
but here the current continuation has been re-jiggered to add 1 to what it
receives: 4 + 10 → (+1) → 15.
(let ((*cont* (lambda (x) (+ 1 x))))
(display (bar 4))
(newline))
All of the scheme code for this is in the file cpm.scm,
which I've included in the tarball. It's not quite complete yet, there is
one more macro that needs defining (PG's =bind macro), but
what's there is, I think, correct.
I decided that this stuff warrants a version bump: it's 0.7.5 now.
June 27, 2009
A couple of small cleanups and improvements: I noticed the minor infelicity that if I entered whitespace in front of something evaluatable at the REPL, it got rejected; the parser was being a bit too strict. Oops! I guess I hadn't ever tried that before... fixed now. Also, I unified the treatment of functions, traceable functions, and macros inside the evaluator: now there is only a Func type, which can be traceable or not, and it can be a macro or not. That has the nice side-effect of making macros traceable; a traced macro will produce two lines of trace output when it's hit, one line showing what the arguments are and the other line showing what it expands into; thusly:
lisp> (assert (= 0 1))
trace: assert <- ((= 0 1))
-> (when (not (= 0 1)) (write-string stderr "assertion failure: ") (display (quote (= 0 1)) stderr) (newline stderr) (raise "assertion failure!"))
assertion failure: (= 0 1)
user exception "assertion failure!"
lisp>
It's not quite macro-expand, but it's close enough that
I bumped the version to 0.7.4.
June 21, 2009
Version bump to 0.7.3, I re-enabled control-C catching in the REPL. Other than that, I've mostly been studying PLAI and trying to wrap my brain around continuations (or, more precisely, turn my brain inside-out to match what continuations do).
June 20, 2009
The regexp code I mentioned is
coming along ok: I've got stuff implemented to parse a regexp, to turn the
parse tree into an NFA, and to make transitions according to the NFA given
a start state and an input string. It all works, albeit slowly. Still, since
I'm using the Thompson algorithm, there are no exponentially-bad cases:
thus I can easily generate cases where this code finishes in seconds to
minutes, whereas perl, python, and most of the other big guys would not
finish before the heat death of the universe. Look at the routines
make-slow-regexp and make-slow-string in the file
regexp.scm (also in the tarball). If
you use that with n in the range 30 to 36, you'll cause perl to
run for minutes to a couple of hours; with n set to 128, it would
take right about 1.5×1024 years. In contrast, this code can
handle this n with no particular problems, about ten
minutes or so. So... just gotta pick your benchmark right! :-)
I recently came across a book,
Programming
Languages: Application and Interpretation, that looks very interesting; if
you, gentle reader, are here out of more than idle curiosity, then you may
find that book of interest as well. It is relevant specifically to my little
toy because Prof Krishnamurthi has a long chapter on continuations in the
book, starting from a level where even a barefoot C programmer like me can
follow. So I will be reading with great interest. I expect to follow this
path: first, try to manually rewrite some trivial programs in
continuation-passing style (CPS). It's always possible to do this, although
the resulting program is essentially unreadable (which is probably why it's
not done much). Second, I'll continue the continuation-macros approach I
mentioned below. At that point, I think I should begin to have a good enough
understanding of them to be able to either add continuations to
haskeem at the haskell level, or to write a CPS-transformer
at the scheme level. I haven't yet decided.
June 16, 2009
I added a bunch of bit-diddling
primitives: bits-and, bits-or,
bits-xor, bits-shift,
bits-set, bits-clear, bits-flip,
bits-set?, and bits-get. I have been playing a
bit with simple regexp processing in haskeem, and I implemented
sets as lists of integers. It works, but is a little painfully slow; I
think these bit-wise operators will speed up that stuff considerably, and
they are generally useful for bit-diddling. For positive numbers, they all
do pretty much what you expect; for negative numbers, they kind of assume
a 2s-complement which is one bit larger than the size of the number. I
basically just sucked in the functionality in the haskell base library
Data.Bits.
I also added a (load-library) function (actually, a macro).
If you give it a bare filename, it will look for that file in a list of
directories specified by the environment variable
HASKEEM_LIBRARY_PATH and (load) it... useful
as I'm starting to build up a larger library of scheme files.
June 13, 2009
I added the xkcd image at the top... it seems somehow appropriate.
June 11, 2009
I withdrew when
and unless as special forms... now that macros are present,
it's just as easy to do them that way, and it reduces the number of
special forms as well as making it possible, if desired, to redefine them.
I also tried a version of let as a macro. It works fine,
including nested applications, but I'm not going to take that one out as
a special form.
(defmacro (my-let bindings . body)
`((lambda ,(map car bindings) ,@body) ,@(map cadr bindings)))
I haven't added a macro-expand facility yet, but doing
the same kind of stuff I played with earlier, to test out
defmacro, works pretty well. I'm currently studying Paul
Graham's On Lisp, specifically the chapter on continuations,
to see if I can do continuation macros approximately the way he does them.
That's not quite supporting them "natively" in the language, but... it
gets close. The continuation macros that PG lists are, I think, quite
similar to Haskell monads, although I haven't looked at them closely
enough to completely prove that.
June 6, 2009
Macros... after a fair bit of
reading, I've concluded that full R6RS hygienic macros are a
little too complicated for me to tackle right now; there are too many
subtleties that I don't understand yet. But I found a very lucid explanation
of the (seems to me) simpler (defmacro) in Peter Siebel's book
Practical Common Lisp.
Since I'm not being doctrinaire about this project, I'll try to implement
some form of that first, and see where it takes me.
So, what's a macro? Basically, it's a function that accepts and returns
unevaluated forms, and that runs at macro expansion time instead of at
runtime. Let's break that down a bit. A really simple example is
an assert macro. This doesn't particularly show the full
implications of the above, but it's a good small start. First,
an assert function:
(define (assert some-cond)
(when (not some-cond)
(write-string "assertion failure: ")
(display some-cond)
(newline)))
Does it work? Almost... it certainly catches assertion failures, but the error message that it prints is not quite what we'd like to see. If we have
(define x "foo")
(assert (eqv? x "bar"))
we do see an assertion failure, but the message that gets printed is:
assertion failure: #f
instead of what we'd really like to see:
assertion failure: (eqv? x "bar")
If we use regular functions, as above, there is no possibility of achieving
that, because by the time some-cond has arrived inside the
assert function, it has already been evaluated to a value, in
this case #f. That's where the "unevaluated forms" from above
comes in. There aren't macros in haskeem, but we can simulate
them. First, a second version of assert that returns unevaluated
forms:
(define (assert-macrolike some-cond)
`(when (not ,some-cond)
(write-string "assertion failure: ")
(display ',some-cond)
(newline)))
Then we use it: we feed it an unevaluated form, note the quote, and we
run it: that's the (define macro-expand ...) line, which is
macro expansion time. Finally, we evaluate the result of that, that's the
(eval macro-expand) line, and that's runtime:
(define x "foo")
(display x)
(newline)
(define macro-expand (assert-macrolike '(eqv? x "bar")))
(display macro-expand)
(newline)
(eval macro-expand)
The result of this is
"foo"
(when (not (eqv? x "bar")) (write-string "assertion failure: ") (display (quote (eqv? x "bar"))) (newline))
assertion failure: (eqv? x "bar")
and we're happy!
So... what does it take to turn the above into what we want?
(defmacro (assert-macro some-cond)
`(when (not ,some-cond)
(write-string "assertion failure: ")
(display ',some-cond)
(newline)))
(define x "foo")
(display x)
(newline)
(assert-macro (eqv? x "bar"))
I think it won't take too much work at all. The (defmacro)
stuff is completely analogous to (define). I will start by only
implementing the variant with a fixed number of arguments, but I think the
variable-number variant should be simple as well. The only issue is that the
result associated with the name needs to be marked as a macro rather than a
function; no big problem there. Then, in the generic function-evaluation code,
I need to add a check for macro-ness, and if it comes out true, do the
slightly different processing of passing unevaluated forms to the macro,
running that (macro-expansion time), and finally running the result
(runtime).
Here is a file containing the above code, plus another pseudo-macro: two versions of a 3-way numeric if.
All in all, I think this might be only one or two dozen lines of code. Check back in a couple of days!
<time passes...>
Holy Y combinator, Batman! Under an hour! The important change
turned two lines of code into three, plus fiftyish additional lines of
pretty trivial cut'n'paste additions to introduce the new datatype
Macro, in all respects analogous to Func, and
add the new special form defmacro... ¡Ay Caramba!
Here is another sample file containing actual
defmacro versions of the above.
Version bump to 0.7.0; now I need to implement a macro-expand
special form, to aid in figuring out what any given macro is doing.
May 29, 2009
Ouch! Long delay since the last
update here... what with one thing and another, various important life events,
no haskeem hacking in a long time. But I have returned
to haskeem and have removed a couple of small spots of bit-rot,
plus fixed a small bug I noticed just recently.
The bit-rot had to do with the change in ghc from GNU
readline to haskeline: the latest version of
ghc doesn't use readline, and that's a bit of functionality
I am reluctant to live without. So I've updated haskeem to
use haskeline. I did lose one small bit of functionality in
that change: interrupting a long calculation once again kills the whole
interpreter, rather than getting back to the lisp> prompt.
Oops! Well, I'll hopefully get that worked out soon.
For people with older versions of ghc or who don't have
haskeline installed, I've left the old version of the main
file haskeem.hs, where this stuff was changed, in the tarball.
The bug had to do with scoping and variable bindings: for some reason
I don't now remember, I had (define) simply update a variable
(ie, like set!) if it was already defined in the
environment... well, that kinda messes up scoping. Consider this program:
(define str "I am top-level")
(define (foo) (write-string "foo: " str #\newline))
(define (bar)
(define str "I am internal")
(write-string "bar: " str #\newline)
(foo)
(write-string "bar: " str #\newline))
(foo)
(bar)
(foo)
That ought to print out three "I am top-level" interspersed with two "I am internal"... instead, it was printing out one "I am top-level" followed by four "I am internal". Fixed now...
In real new stuff, I have been thinking about both continuations and
macros. I think macros will be straightforward but a lot of work, and I
expect I'll start doing those. Continuations, ie,
call/cc, are not so straightforward, but I think I am starting
to understand them. I do not now remember where I saw this, but I do know
that it was a comment left by Audrey Tang in response to an article on
continuations; and it's worth repeating. In quasi-perl, here's a capsule
summary of what continuations are:
$continuation = \&return;
Not so difficult, really. However, I don't think I can implement them in
haskeem; I think the flow of control at the haskell level where
each expression is evaluated is not accessible enough for me to be able to
wrap it up and make it available at the scheme level inside
haskeem; or rather, reworking the innards to make the flow of
control explicit enough would basically come to a complete rewrite of the
whole interpreter. Bummer!
March 5, 2008
Not so much haskell hacking in the
last few days. I did finish the nested quasiquote etc stuff. As
expected, it was straightforward but a little tedious. Now it returns all the
proper results for the example cases from R6RS.
I have tweaked the powerseries examples a little bit. I verified
the Bessel functions, added Bessel functions of the second kind
In, and simplified the (ps-recip) function:
it had previously needed three terms where it ought to have needed only
two. After a bit of looking, I found the problem: it wasn't in
(ps-recip) at all, but in (ps-mul): I wasn't
being lazy enough in that routine.
February 24, 2008
Only a little bit of hacking today:
I added a (new-symbol) special form; I'm starting to think about
doing macros, and I think that'll be necessary. I avoid name clashes with
user-defined symbols by simply generating the internal symbol with a leading
space: the rest of the system is perfectly happy with such a symbol, but the
parser won't pass the space; so the user can't accidentally generate such a
symbol. Clashes between multiple internally-generated symbols are avoided by
simply sticking a global counter into the symbol name: every time a new
internal symbol is manufactured, the counter gets incremented.
I added a non-gnu (and non-editing) version of readline, for those machines which don't happen to have gnu readline installed. It's commented out inside haskeem.hs.
I think I added the necessary bits to the Bessel function power series in powerseries.scm; at any rate, it correctly computes the first (non-trivial) zero of the first five Bessel functions.
February 23, 2008
Again, not too much hacking of the
interpreter; more scheme hacking the past few days... that is a
problem with writing an interpreter, you then want to use it! :-) With a few
hints from Bulat Ziganshin on the haskell-cafe mailing list, I did add the
ability to catch keyboard interrupts; so it's possible to type control-C at a
long-running calculation and get back to the lisp>
prompt. Previously, that would have just killed the interpreter, which is
kinda nasty. Thanks, Bulat!
The scheme hacking consisted mainly of some playing with power series. A few days ago, I was re-reading Higher-Order Perl by Mark Jason Dominus, and I decided to re-implement his section 6.7 on power series. The code is here, if you want to play with it; be aware that the Bessel function series is incomplete. The equivalent of his last example, the head of the power series for tan(x), is
lisp> (stream-head ps-tan 10)
(0 1 0 1/3 0 2/15 0 17/315 0 62/2835)
February 20, 2008
Not much hacking of the interpreter,
just adding several functions to the library: random-number generation, plus
networking: I can now write a TCP client and server in haskeem.
Here's a trivial server:
(define listener (listen-on 2500))
(write-string stderr "listening on port 2500\n")
(let ((remote #f)
(host "")
(port -1)
(ret '()))
(do ((i 1 (+ i 1)))
((> i 10) (close-port listener))
(set! ret (accept listener))
(set! remote (car ret))
(set! host (cadr ret))
(set! port (caddr ret))
(write-string stderr "got connection from " host #\linefeed)
(write-string remote "hello, " host ", how are you?\n")
(write-string remote "The current local time is " (localtime) #\linefeed)
(close-port remote)))
And a trivial client to go with it:
(define host "localhost")
(define port 2500)
(when (positive? (length args))
(set! host (car args))
(set! args (cdr args)))
(when (positive? (length args))
(set! port (string->number (car args)))
(set! args (cdr args)))
(define remote (connect-to host port))
(do ((spew (string-join-by " "
"Connecting to host" host
"port" (number->string port))))
((not spew) (write-string "all done!\n") (close-port remote))
(write-string spew #\linefeed)
(set! spew (read-line remote)))
A super-high-performance web server is probably not in the cards, but I think I could probably build a small web spider on top of this functionality... :-)
February 17, 2008
The quasiquoting unquoting operators
are mostly working. I have implemented the unwrapping I had left yesterday,
and haskeem is passing most of the test cases shown in
R6RS. There are two areas where it is not yet working: neither
nested quasiquotes nor the caboose of a dotted-list are properly handled
yet. I think both of these are straightforward but a bit involved: doing
nested quasiquotes is simply a matter of keeping track of the nesting, but
that involves threading a nesting count through the whole machinery of the
evaluator, and that touches a fair amount of code. Doing dotted-lists right
is, I think, analogous to properly handling them in the parser, and I've
already got that working. It's made a little tricky by JTang's original design
decision, which I adopted as well, of using Haskell lists as the basic unit of
list structure:
(a . (b . (c . (d . ()))))
is exactly and explicitly equivalent, according to R6RS, to
(a b c d)
as are all of these (plus a couple more):
(a b . (c d))
(a b . (c . (d . ())))
(a b c . (d . ()))
and that equivalence is not entirely natural when using Haskell lists (indeed, the parser presented in JTang's tutorial doesn't handle this correctly). So there is a bit of work required to correctly handle dotted-lists.
time passes...
The dotted-list problem is solved: indeed, it turned out not to be
analogous to the stuff in the parser, it turned out to be the
parser. If you examine the examples above, you'll see there is a simple rule
for how to turn dotted-lists into lists: if the thing in the caboose
position is itself a proper list, then the overall dotted-list object is
actually a proper list. That rule, applied repeatedly, is sufficient to turn
all of the above examples into the "canonical" list (a b c d),
and there is (was) code in the parser to do just that.
However... in the presence of unquote, that comes out
wrong. The test case from R6RS which checks this is
`((foo ,(- 10 3)) ,@(cdr '(c)) . ,(car '(cons)))
which turns into
(quasiquote ((foo (unquote (- 10 3))) (unquote-splicing (cdr (quote (c)))) . (unquote (car (quote (cons))))))
Examine the caboose: it's (unquote (car (quote
(cons))))... which is a two-element proper list; so the parser
promotes the overall expression to a proper list, the unquote
is in the middle of the list by itself, and then it all falls apart in the
evaluator.
So I have added a bit of special-purpose pattern matching to the code in
the parser that checks for lists in the caboose: if it's a list, but the
first item in the list is a symbol, and that symbol is either
unquote or unquote-splicing, then the promotion
gets skipped. And then the test succeeds!
February 16, 2008
I've added support for
vectors: #(...). At first I was going to use the
Haskell Data.Array datatype, but documentation for that is
somewhere between nonexistent and crappy, and what little I could make out
from the sources seemed to suggest that this was more for native datatypes:
the equivalent of C's int, double, etc.
My LispVal doesn't seem to fit in there. Fortunately,
the Data.IntMap datatype seemed very well suited to what I
wanted, so that was the route I chose. The R6RS merely says
that
Vectors are heterogeneous structures whose elements are indexed by integers. A vector typically occupies less space than a list of the same length, and the average time needed to access a randomly chosen element is typically less for the vector than for the list.
which fits very well with Data.IntMap: that is implemented
using PATRICIA trees, and many of the operations take
O(min(n,W)) time, where n is the length of the
vector and W is the word size of the computer: 32 or 64. I
don't know what the constant out front is, and I don't care too much about
that; for large vectors, it's bound to be better than O(n) for
a list.
I have also begun to implement more of the quoting constructs:
(quote foo) was already present; now I'm adding
(quasiquote foo), (unquote foo), and
(unquote-splicing foo) -- so far, only the easy bits: the
abbreviations
’<datum>for(quote <datum>)
‘<datum>for(quasiquote <datum>)
,<datum>for(unquote <datum>)
,@<datum>for(unquote-splicing <datum>)
At least with the browser I'm using right now, the quote characters are
very hard to distinguish: ’, the character used in the abbreviation
for (quote <datum>), is the right single quotation mark,
located on the right of my keyboard, and ‘, the character used in the
abbreviation for (quasiquote <datum>), is the left single
quotation mark, located on the left of my keyboard. Actually, the keyboard
characters should probably be called apostrophe and backtick, respectively.
time passes...
These are almost working: there is still one extra level of
parentheses in the result, for example ‘(1 2 ,(+ 3 4))
should return (1 2 7), but actually returns (1 2
(7)). I think I understand why, and what to do about it... but that's
for tomorrow.
February 11, 2008
I've fixed the glitch in writing numbers in scientific notation; that was pretty straightforward, although I don't know how efficient my solution is. But efficiency isn't the primary target here...
I've also added a rather specific new type of comment to the parser: it now understands and ignores "#!" lines at the top. So I can have actual directly-executable scheme scripts:
% ls -lF try
-rwxr-xr-x 1 uwe users 134 Feb 11 17:05 try*
% cat try
#!/home/uwe/tools/haskeem
(set! args (cons "It's alive! It's ALIIIIIVVVE!" args))
(write-string (string-join-by " " args) #\linefeed)
% ./try
It's alive! It's ALIIIIIVVVE!
% ./try meow
It's alive! It's ALIIIIIVVVE! meow
So now, if I need to, I can compute symbolic derivatives in a cron job :-)
February 10, 2008
I have just received an object
lesson in just how lazy Haskell is... I was working on implementing printing
numbers in scientific notation, n.nnEnn, and when I thought I had that
working, I started doing a couple of tests in haskeem. I entered
a couple of numbers which worked fine, and then I entered 145e-2
which promptly crashed the interpreter. I looked at the write-number routine
and couldn't find anything, so I started sneaking up on the problem: I entered
the number in such a way that I wasn't going to print its value at all
(define returns the value it assigned to the variable, and the
REPL prints that, so that wouldn't work; I had to write (begin (define
var 145e-2) #t) to get the value into the interpreter without printing
it), then I started querying it: is it an integer? no; is it a rational? no;
is it a real? yes; is it infinite? and the interpreter crashed again. After a
little while, I tracked the problem to the parser: I allow the parser to
return numbers of the form aaaEbb as integers, if there is no decimal point in
the "aaa" part and the "bb" part is non-negative: ie, 123e4 is the integer
number 1230000. It turned out that I was insufficiently careful in how I had
implemented that, and this particular number 145e-2 satisfied the first part
of the test, no decimal point in the "aaa" portion, but not the second part,
the "bb" portion is non-negative. So the parse wasn't fully evaluated
until well after the interpreter had returned from the parser and had
already processed several other commands. In fact, the "is it infinite?" check
was the first one where the actual value mattered, as opposed to just the
format. Wow! I knew Haskell is lazy, but only in the front of my mind, it
seems; my reptilian brainstem was still thinking "oh, it's returned from the
parse, so it's finished with that" (well, to the extent that my reptilian
brainstem thinks about programming at all... :-) )
That's all fixed now, and scientific notation works, with still one small
glitch: if a number has more nonzero digits in its tail than the specified
precision, rounding occurs, and that rounding can of course propagate up
toward the head: so for example 0.9999 gets printed as 1.00
with two digits after the decimal point. If that happens when formatting a
number in scientific notation, 0.9999 gets formatted as
9.999e-1 when enough digits of precision are available, but as
10.00e-1 rather than as 1.00e0 when rounding
occurs. The value is correct, just the format is slightly off. This is
purely my bug, nothing to do with GHC's native number-printing routines at
all, and I understand where and why it's happening, I just haven't fixed it
yet.
February 9, 2008
I'm in the process of adding an
optional precision specifier to (number->string): it does the
equivalent of C's %.Nf now, for integers and rationals; I'm still
working on the same for floating-point numbers, and I haven't started
%.Ne at all. For integers and floating-point numbers, this isn't
so hugely necessary (although it's still pretty useful); but if you're doing
some big calculation involving rational numbers, say computing the sum of the
reciprocals of the first hundred primes, it's very much nicer to know that
that result is about 2.106342121472621094 than that it is the ratio of two
several-hundred-digit numbers:
9924938317306578104794915348371074960828795860429929737290618944808506634743370796528436363734050330051393733303456260639596308477098909255081265596725817362940507726427713984550226766935610334519674156561744081975868623/ 4711930799906184953162487834760260422020574773409675520188634839616415335845034221205289256705544681972439104097777157991804380284218315038719444943990492579030720635990538452312528339864352999310398481791730017201031090
I dunno about you, gentle reader, but that makes my eyeballs bleed...
To compute that, I actually used the exception-handling stuff in kind of a neat fashion. Here's a bit of code: first define an initial table of the lowest couple of primes; we have to have 3 in there, because 2 is kind of an odd prime...
(define primes '(2 3))
Then define a routine to check for primality. This is the part I think is
cool: I could have a moderately complicated test to determine if I can exit
the loop, and then another bit of code to determine what result the routine
should return, but using a guard to catch some exceptions, and
then within the loop just raise an exception with the return value of the test
ends up being cleaner; and it doesn't interfere at all with the actual
exception of running off the end of the table: that one isn't caught by the
guard.
(define (is-prime? n)
(guard (err ((boolean? err) err))
(do ((pl primes (cdr pl))
(cur 0))
((null? pl) (raise "exhausted primes list!"))
(set! cur (car pl))
(if (= cur n) (raise #t))
(if (zero? (remainder n cur)) (raise #f))
(if (> (* cur cur) n) (raise #t)))))
Then this next routine computes the next prime past the end of the table and adds it to the table, until the size of the primes table is at least n.
(define (grow-primes-table n)
(do ((j (length primes) (+ j 1)))
((>= (length primes) n) primes)
(do ((i (+ 2 (last primes)) (+ 2 i)))
((is-prime? i) (set! primes (append primes (list i)))))))
And finally this is the routine that computes the sum of the reciprocals of
the first N primes. It makes use of the fact that the
grow-primes-table routine above returns the list of primes after
it has finished growing it.
(define (sum-recprimes n)
(apply + (map / (list-head (grow-primes-table n) n))))
If you invoke this with
(number->string (sum-recprimes 100))
you get the eyeball-bleed-inducer above, whereas if you invoke it with
(number->string (sum-recprimes 100) 10 18)
you get the much nicer 2.106...
February 2, 2008
Major reworking of error handling. I
figured out a couple of problems with the error handling, in particular error
messages generated in non-interactive mode are now printed. I had that very
wrong before: I was using runIOThrows, a wrapper around
catchError, inside Control.Exception.try. In
hindsight, it's perfectly obvious that if the inner catcher is handling all
the errors, then the outer catcher won't have much of anything to
do... d'oh! <palm to forehead>
This does not yet address the exception-handling at the scheme level, but I think I understand Haskell's error handling much better now, so hopefully I'll get the scheme-level exception handling working now, too.
time passes...
Yes! new special form guard, with syntax as described in
R6RS. Check this out: first define a small function...
; the first 'test', "(begin...)", is only to print out the value of
; the exception that was thrown: that 'test' always returns #f
(define (tryit test)
(guard
(err ((begin (write-string "exception is '")
(display err)
(write-string "'\n")
#f) #t)
((and (string? err) (string=? err "meh"))
(write-string "caught 'meh'\n")
"I am the 'meh'-catcher")
((and (string? err) (string=? err "barf"))
(write-string "caught 'barf'... yuk!\n")
-42)
((and (string? err) (string=? err "yahoo!"))
(write-string "a little excitable today, aren't we...\n")
"US$44.6billion")
((eqv? err '(1 2))
(write-string "does not compute\n")
"daleks rule! (until their heads blow off)"))
(raise test)))
and then run some tests with it:
lisp> (load "guard.scm")
(lambda (test) ...)
lisp> (tryit "meh")
exception is '"meh"'
caught 'meh'
"I am the 'meh'-catcher" <-- return value
lisp> (tryit "barf")
exception is '"barf"'
caught 'barf'... yuk!
-42 <-- return value
lisp> (tryit "yahoo!")
exception is '"yahoo!"'
a little excitable today, aren't we...
"US$44.6billion" <-- return value
lisp> (tryit '(1 2))
exception is '(1 2)'
does not compute
"daleks rule! (until their heads blow off)" <-- return value
lisp> (tryit '(1 3))
exception is '(1 3)'
user exception (1 3) <-- NOT caught! this got printed by
the outermost level, the 'P' in 'REPL'
Notice that we can throw arbitrary lisp objects as exceptions. W00t! Izzat cool, or what? This mechanism will also catch system-level exceptions, ie, "file not found" kinds of errors; however, those will just show up as string values, like "nil.dat: openFile: does not exist (No such file or directory)". So processing those is a little messier, although still doable.
February 1, 2008
Not so much Haskell hacking, but a few small tweaks in stdlib.scm, plus one very nice usability feature: connect the REPL with gnu readline. That makes it very much nicer to use interactively! I'm hereby declaring that it's no longer (almost) Scheme in Haskell, it's just Scheme in Haskell.
January 28, 2008
I have finished the do
syntactic form, although it's not yet in the self-tests (by now 800+); but a
small separate script using it gives the right results. The exception-handling
stuff, which I thought would be straightforward, is proving less so: I
apparently don't understand enough Haskell yet, and I am getting various
type-mismatch error messages which I haven't yet managed to untangle. Oh
well...
January 26, 2008
I have added a bunch of functions
for manipulating files and directories, reading the environment, and so on,
plus miscellaneous cleanups; for example, it was giving a parser error message
if the user just entered a blank line at the prompt. That was correct, as far
as the parser goes -- it expects to see at least one scheme expression or
object -- but kinda unfriendly. Also, a special form to display the
currently-active bindings: dump-bindings.
January 22, 2008
I have added delay
and force, which allow the definition of lazily-evaluated
infinite datastructures such as streams. For example, define the natural
numbers as follows:
(define natural-numbers
(letrec ((next
(lambda (n)
(cons n (delay (next (+ n 1)))))))
(next 1)))
ie, the integer 1 plus a promise to evaluate the rest of the natural numbers if it should become necessary; then it's possible to do
(define (stream-head strm n)
(if (<= n 0)
'()
(cons (car strm) (stream-head (force (cdr strm)) (- n 1)))))
(stream-head natural-numbers 20)
and get
(1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20)
Very cool!
On one level, this was pretty trivial: add a new Delay
constructor to the LispVal stuff in lispdata.hs, then in
evaluator.hs add code to just return a Delay object when a
(delay foo) special form is seen, plus code to actually
evaluate what was stored if a (force bar) special form is
seen. However, that by itself fails one of the tests in R6RS:
with just that, it's possible to make two-timing promises that evaluate to
different results depending on when they are forced; that's not so nice!
The solution, adding a variable into which to store the computed value
and just return that from then on, was straightforward, too, except for
making the variable name. It doesn't really matter what that name is, as
long as it doesn't clash with anything else, but that "doesn't clash with
anything else" was tricky. I tried for a while to use the State
monad, but I couldn't get that to work. Eventually, I decided to just stick
stuff into the same environment that's passed along when the
Delay object was created, using a global variable defined at
startup in the top-level environment. That worked fine.
January 19, 2008
I've been interested in the Haskell programming language for a while, but was never really able to get into it. I wrote a couple of tiny toy programs, but that was about it, until I recently came across Jonathan Tang's most excellent Haskell tutorial Write Yourself a Scheme in 48 Hours. I decided that that looked like a lot of fun, so I started on it. By now, I've spent a fair bit more than 48 hours on it, but I've also gone a ways beyond what JTang described. I started during the christmas 2007 holiday, and have, as of late January 2008, spent the idle hours of a couple of weeks at it. It's starting to be a neat little toy, so I'm putting it onto the web here for others to play with or possibly learn from.
Before I go any farther, a couple of points:
OK. With that out of the way, there are a couple of points where I am actually and intentionally critical of what JTang wrote, and where I think he should have done better:
The first is an actual programming issue: the parser JTang described, and the one you will write, should support some kind of comments. R6RS (the Revised6 Report on Scheme) describes a couple of different kinds, of which I have so far implemented only the "semicolon to line-end" flavor, but it is essential, once you write anything longer than a line or two, to be able to comment it. Perhaps you'll only ever write a single comment that says Look and marvel, world, at the brilliance of my lisp koan!... that's ok; but maybe you'll also write things like OMG! WTF is this stupid interpreter doing?! Fix this!!! and then later on you'll be able to find it again and fix it. Whatever. Comments are essential, so when you go through JTang's tutorial, pretend that he talked about comments, and add the ability to comment stuff into the parser early on. It's not hard, and you'll be grateful for it soon enough.
The second issue is less of a programming issue and more being
systematic. However much of a toy this is, it is an actual interpreter for an
actual language, and you have to test it as you go, but you also have to check
that your new hacks don't break anything old. So keep track of small tests,
collect them into one large file, and once your interpreter is able to execute
a file, set things up to run that self-test; then run it often. This is
something that I think JTang should have emphasized more. By now I'm up to 400
tests in the self-test for haskeem. These aren't all totally
distinct, a bunch of them are exploring corner cases of simple math functions,
but not only does that give me some confidence that it's doing the arithmetic
right, it also caused me to find a couple of small bugs in GHC, the Glasgow
Haskell Compiler, which I have duly reported. (GHC doesn't handle NaNs quite
correctly in all cases, for example 1.0 ** NaN should be NaN but is evaluated
as 1.0.)
All right, here's where you should stop if you want to do it yourself. Go write yourself a scheme in 48 hours, and come back when you're done with it!
So what's haskeem? It's
(mostly) a sub-set of R6RS Scheme; while writing it, I did retrieve
the R6RS, and I went through it, trying to find and eliminate
differences. To the extent possible, I did and do want it to be compatible,
although I did do a couple of small things differently; I'll describe them
more below.
JTang's tutorial implements an interpreter with integer-only arithmetic
and only base-10 input and output of numbers, whereas scheme supports
integers, rational numbers, and real and complex floating-point numbers; all
of these can furthermore come in several different precisions, plus they can
be exact or inexact, and the integer quantities can be written in base-2,
base-8, base-10, and base-16. I have not implemented all of this!
It's important for writing large robust programs; it's not essential for
writing a small toy. But I have implemented parts of it: I began by adding
support for real floating-point numbers, since I wanted to add a bunch of math
functions, and having, for example, (sin x) always return only
one of -1, 0, or +1 just doesn't quite cut it; the numerical analyst
cringes!
This is the point where I made my one major excursion from R6RS
Scheme syntax: that document describes and allows only base-10 floating-point
numbers, but when I was adding floating-point numbers to the parser, I found
that I was re-using enough pieces of the integer-parsing code, which supports
multiple bases, that it would be pretty simple to add support for
floating-point numbers in the other bases. So I did. So it's possible to
specify the floating-point number #xdeadbeef.c0ffee in
haskeem (it's 3.7359285597539053e9), or to see that pi in base-16
is about #x3.243f6a8885a3.
I have also mostly finished adding support for rational numbers. There are
some small differences in the way I treat them versus how R6RS
treats them. For finite rationals, of course, it's all the same, but if I
understand the Report correctly, they are disallowing rational infinity and
NaN: +1/0, -1/0, and 0/0, whereas I think these are useful quantities for the
same reasons that floating-point infinities and NaNs are useful, so I allowed
them and have attempted to deal properly with them in
haskeem. That was actually a bit tricky, since Haskell's native
rational numbers don't play well with these quantities. Haskell will recognize
them and can read and write them, but doing arithmetic with them leads to
runtime exceptions, so I had to add a fair bit of special-case code to deal
with all the possibilities..
That's the next major area where I added stuff to what JTang showed. Try
typing this at his interpreter: (/ 1 0). If it does the same
thing on your computer as on mine, it'll give you a "division by zero" runtime
exception and dump you back to your shell prompt. Not so good! Another area
where I found the same thing was when implementing some of the file-IO
stuff. It's pretty user-hostile to signal end-of-file with an exception that
kills the interpreter, so adding exception-handling was pretty important. I'm
not sure I have this entirely right: although I've wrapped the entire main
REPL in a Haskell try, it is still not catching division by zero. (No, Haskell
experts, it's not that I'm using try from the prelude, it's
Control.Exception.try, and it's not catching divide by zero; I (sic) tried,
although in a small test program CE.try does catch divide by zero.)
However, haskeem as a whole is catching divide by zero, I just
used a different method to do it.
I have also added most of the core R6RS syntactic forms: I've
got and, apply, begin,
case, cond, define, if,
lambda, let, let*, letrec,
letrec*, or, quote, and
set!. (Incidentally, JTang implemented and and
or as functions wrapped around && and
||; this is not quite right, because and and
or are supposed to short-circuit as soon as the value of the
expression can be determined.) Still missing are let-values,
let*-values, do, when and
until (although these last two are pretty trivial), and
set-car! and set-cdr!; these last two seem to be
regarded with some mixed feelings by the R6RS, and I'm not at all
sure I'll ever put them into haskeem. In part this is due to a
design decision by JTang; more on this in a bit. Also missing is everything
related to quasiquote.
I have a mutant version of eval, which actually evaluates an
expression in the current function context rather than the top-level
context. It didn't occur to me to make it evaluate in the top-level context,
that's why I made this a special syntactic form rather than a regular library
function. In order to make this work with things like (map eval
some-list), I added a function to the standard library: (define
(eval x) (eval x)). (If you look, you'll see one of those comments I
mentioned next to this definition. :-) )
More related to the actual running of the interpreter, and less related to
the scheme language, is the load special form. I am not exactly
sure now why I made it a special syntactic form rather than a library function
(I should have added a comment there, too!), but it is. Perhaps it was for
easy access to the environment; I haven't added any facility for getting the
execution environment inside library functions.
Another special form that's related to the specifics of the interpreter is
trace, a very modest facility for observing how the interpreter
is doing stuff. If you examine the file fibo.scm, you'll see it in action; it
produces output like this:
lisp> (load "fibo.scm")
trace bad-fibo on
#t
lisp> (fibo 6)
trace fibo-acc on
trace: fibo-acc <- (6 1 0)
trace: fibo-acc <- (5 1 1)
trace: fibo-acc <- (4 2 1)
trace: fibo-acc <- (3 3 2)
trace: fibo-acc <- (2 5 3)
trace: fibo-acc <- (1 8 5)
8
lisp> (bad-fibo 6)
trace: bad-fibo <- (6)
trace: bad-fibo <- (5)
trace: bad-fibo <- (4)
trace: bad-fibo <- (3)
trace: bad-fibo <- (2)
trace: bad-fibo <- (1)
trace: bad-fibo <- (0)
trace: bad-fibo <- (1)
trace: bad-fibo <- (2)
trace: bad-fibo <- (1)
trace: bad-fibo <- (0)
trace: bad-fibo <- (3)
trace: bad-fibo <- (2)
trace: bad-fibo <- (1)
trace: bad-fibo <- (0)
trace: bad-fibo <- (1)
trace: bad-fibo <- (4)
trace: bad-fibo <- (3)
trace: bad-fibo <- (2)
trace: bad-fibo <- (1)
trace: bad-fibo <- (0)
trace: bad-fibo <- (1)
trace: bad-fibo <- (2)
trace: bad-fibo <- (1)
trace: bad-fibo <- (0)
8
The stuff in parentheses is the argument list being handed to each invocation
of each traced function (there can be any number being traced simultaneously).
This doesn't yet show indentation to represent who's calling whom, but it is
nonetheless already very useful in seeing what's going on. This example very
clearly shows the difference in recursive calling between the two versions of
the fibonacci calculation. I intend to use haskeem to work my way
through as much of Structure and Interpretation of Computer Programs
as I can, and I expect that this will help a lot.
So what's missing, what's next? I plan to improve this in a couple more areas, as I have time and ability. I think these are more or less listed in the order in which I'll attack them, and also in order of increasing difficulty:
catch and throw
(although the spellings of the names may be different; I'll have to see what
the R6RS says). At least the catch has to be a special
form rather than a regular function, because we don't want all of its
arguments evaluated beforehand: initially, only the thing to be tried is
evaluated, and only if it throws an error does the error-handler get called. I
think this will be fairly straightforward.delay and
force. These are the Scheme syntactic forms for doing lazy
evaluation, and allow infinite (non-cyclic) datastructures, which is quite
cool.haskeme at that point! :-) However, I don't
know how easy or hard this is; I expect it's a moderately significant
effort.haskeem
that would require it; for those occasions where I would need complex numbers,
I can probably do it explicitly as pairs of reals.
If you have questions or comments about the stuff here, please contact me at korg@korgwal.com.
Enjoy! -- Uwe Hollerbach
Page last modified at 2010-01-04 21:40 US/Pacific