Last update: June 28, 2009

XKCD cartoon: The Lisp Cycles

Welcome to the haskeem web site!

What's haskeem? haskeem is pretty much Scheme in Haskell... however, if you're looking for an industrial-strength Scheme interpreter, this might not be it! It doesn't implement the full Scheme number tower, it's not properly tail-recursive, it doesn't have full R6RS hygienic macros (but defmacro is present; see below), it doesn't have call-with-current-continuation. If you're a real schemer, you're probably snickering by now... that's ok! haskeem is a toy that's letting me learn more about both Scheme and Haskell, it's a lot of fun to work on, and that's enough for me! I arbitrarily called the first version I put onto the web "0.5"; I'm now at 0.7.5, which is available here, almost all under GPL; I put some of the scheme code I've been writing under BSD3. But see also my comments below on doing it yourself: you'll learn far more if you go through Jonathan Tang's tutorial Write Yourself a Scheme in 48 Hours and write all of this yourself, rather than just grabbing what I've got here and looking at it.

What does it have?

What bugs do I know about? At this point I know of two bugs in stuff that's at least partially implemented:

If you have questions or comments about the stuff here, please contact me at korg@korgwal.com or at uhollerbach@gmail.com.

Enjoy! -- Uwe Hollerbach


June 28, 2009

I have continued to read PLAI and to mess with the continuation-passing macros; I think that's coming along ok. In order to make it work the way I think it ought to, I had to make a small change in the evaluator: previously, the expanded syntax of macros was getting evaluated still in the environment that was captured when the macro was defined, and that was messing up the last of the CPM examples below: I was getting "14" instead of "15" which I think is the right answer.

This should print "5"

    (define foo (=lambda (x) (=return (+ 1 x))))

    (display (foo *cont* 4))
    (newline)

Same (foo) as above; this should print "10"

    (let ((*cont* (lambda (x) (* 2 x))))
      (display (foo *cont* 4))
      (newline))

This should print "14": the function says to add 10 to the value passed in, and the current continuation, which is the default identity, is then applied to that value: 4 + 10 → identity → 14

    (=define (bar x) (=return (+ 10 x)))
    (display (bar 4))
    (newline)

This should print "15" ... I think? The function is the same as before, but here the current continuation has been re-jiggered to add 1 to what it receives: 4 + 10 → (+1) → 15.

    (let ((*cont* (lambda (x) (+ 1 x))))
      (display (bar 4))
      (newline))

All of the scheme code for this is in the file cpm.scm, which I've included in the tarball. It's not quite complete yet, there is one more macro that needs defining (PG's =bind macro), but what's there is, I think, correct.

I decided that this stuff warrants a version bump: it's 0.7.5 now.


June 27, 2009

A couple of small cleanups and improvements: I noticed the minor infelicity that if I entered whitespace in front of something evaluatable at the REPL, it got rejected; the parser was being a bit too strict. Oops! I guess I hadn't ever tried that before... fixed now. Also, I unified the treatment of functions, traceable functions, and macros inside the evaluator: now there is only a Func type, which can be traceable or not, and it can be a macro or not. That has the nice side-effect of making macros traceable; a traced macro will produce two lines of trace output when it's hit, one line showing what the arguments are and the other line showing what it expands into; thusly:

    lisp> (assert (= 0 1))
    trace: assert <- ((= 0 1))
       ->  (when (not (= 0 1)) (write-string stderr "assertion failure: ") (display (quote (= 0 1)) stderr) (newline stderr) (raise "assertion failure!"))
    assertion failure: (= 0 1)
    user exception "assertion failure!"
    lisp>

It's not quite macro-expand, but it's close enough that I bumped the version to 0.7.4.


June 21, 2009

Version bump to 0.7.3, I re-enabled control-C catching in the REPL. Other than that, I've mostly been studying PLAI and trying to wrap my brain around continuations (or, more precisely, turn my brain inside-out to match what continuations do).


June 20, 2009

The regexp code I mentioned is coming along ok: I've got stuff implemented to parse a regexp, to turn the parse tree into an NFA, and to make transitions according to the NFA given a start state and an input string. It all works, albeit slowly. Still, since I'm using the Thompson algorithm, there are no exponentially-bad cases: thus I can easily generate cases where this code finishes in seconds to minutes, whereas perl, python, and most of the other big guys would not finish before the heat death of the universe. Look at the routines make-slow-regexp and make-slow-string in the file regexp.scm (also in the tarball). If you use that with n in the range 30 to 36, you'll cause perl to run for minutes to a couple of hours; with n set to 128, it would take right about 1.5×1024 years. In contrast, this code can handle this n with no particular problems, about ten minutes or so. So... just gotta pick your benchmark right! :-)

I recently came across a book, Programming Languages: Application and Interpretation, that looks very interesting; if you, gentle reader, are here out of more than idle curiosity, then you may find that book of interest as well. It is relevant specifically to my little toy because Prof Krishnamurthi has a long chapter on continuations in the book, starting from a level where even a barefoot C programmer like me can follow. So I will be reading with great interest. I expect to follow this path: first, try to manually rewrite some trivial programs in continuation-passing style (CPS). It's always possible to do this, although the resulting program is essentially unreadable (which is probably why it's not done much). Second, I'll continue the continuation-macros approach I mentioned below. At that point, I think I should begin to have a good enough understanding of them to be able to either add continuations to haskeem at the haskell level, or to write a CPS-transformer at the scheme level. I haven't yet decided.


June 16, 2009

I added a bunch of bit-diddling primitives: bits-and, bits-or, bits-xor, bits-shift, bits-set, bits-clear, bits-flip, bits-set?, and bits-get. I have been playing a bit with simple regexp processing in haskeem, and I implemented sets as lists of integers. It works, but is a little painfully slow; I think these bit-wise operators will speed up that stuff considerably, and they are generally useful for bit-diddling. For positive numbers, they all do pretty much what you expect; for negative numbers, they kind of assume a 2s-complement which is one bit larger than the size of the number. I basically just sucked in the functionality in the haskell base library Data.Bits.

I also added a (load-library) function (actually, a macro). If you give it a bare filename, it will look for that file in a list of directories specified by the environment variable HASKEEM_LIBRARY_PATH and (load) it... useful as I'm starting to build up a larger library of scheme files.


June 13, 2009

I added the xkcd image at the top... it seems somehow appropriate.


June 11, 2009

I withdrew when and unless as special forms... now that macros are present, it's just as easy to do them that way, and it reduces the number of special forms as well as making it possible, if desired, to redefine them. I also tried a version of let as a macro. It works fine, including nested applications, but I'm not going to take that one out as a special form.

    (defmacro (my-let bindings . body)
      `((lambda ,(map car bindings) ,@body) ,@(map cadr bindings)))

I haven't added a macro-expand facility yet, but doing the same kind of stuff I played with earlier, to test out defmacro, works pretty well. I'm currently studying Paul Graham's On Lisp, specifically the chapter on continuations, to see if I can do continuation macros approximately the way he does them. That's not quite supporting them "natively" in the language, but... it gets close. The continuation macros that PG lists are, I think, quite similar to Haskell monads, although I haven't looked at them closely enough to completely prove that.


June 6, 2009

Macros... after a fair bit of reading, I've concluded that full R6RS hygienic macros are a little too complicated for me to tackle right now; there are too many subtleties that I don't understand yet. But I found a very lucid explanation of the (seems to me) simpler (defmacro) in Peter Siebel's book Practical Common Lisp. Since I'm not being doctrinaire about this project, I'll try to implement some form of that first, and see where it takes me.

So, what's a macro? Basically, it's a function that accepts and returns unevaluated forms, and that runs at macro expansion time instead of at runtime. Let's break that down a bit. A really simple example is an assert macro. This doesn't particularly show the full implications of the above, but it's a good small start. First, an assert function:

    (define (assert some-cond)
      (when (not some-cond)
             (write-string "assertion failure: ")
             (display some-cond)
             (newline)))

Does it work? Almost... it certainly catches assertion failures, but the error message that it prints is not quite what we'd like to see. If we have

    (define x "foo")
    (assert (eqv? x "bar"))

we do see an assertion failure, but the message that gets printed is:

    assertion failure: #f

instead of what we'd really like to see:

    assertion failure: (eqv? x "bar")

If we use regular functions, as above, there is no possibility of achieving that, because by the time some-cond has arrived inside the assert function, it has already been evaluated to a value, in this case #f. That's where the "unevaluated forms" from above comes in. There aren't macros in haskeem, but we can simulate them. First, a second version of assert that returns unevaluated forms:

    (define (assert-macrolike some-cond)
      `(when (not ,some-cond)
             (write-string "assertion failure: ")
             (display ',some-cond)
             (newline)))

Then we use it: we feed it an unevaluated form, note the quote, and we run it: that's the (define macro-expand ...) line, which is macro expansion time. Finally, we evaluate the result of that, that's the (eval macro-expand) line, and that's runtime:

    (define x "foo")
    (display x)
    (newline)
    (define macro-expand (assert-macrolike '(eqv? x "bar")))
    (display macro-expand)
    (newline)
    (eval macro-expand)

The result of this is

    "foo"
    (when (not (eqv? x "bar")) (write-string "assertion failure: ") (display (quote (eqv? x "bar"))) (newline))
    assertion failure: (eqv? x "bar")

and we're happy!

So... what does it take to turn the above into what we want?

    (defmacro (assert-macro some-cond)
      `(when (not ,some-cond)
             (write-string "assertion failure: ")
             (display ',some-cond)
             (newline)))
    (define x "foo")
    (display x)
    (newline)
    (assert-macro (eqv? x "bar"))

I think it won't take too much work at all. The (defmacro) stuff is completely analogous to (define). I will start by only implementing the variant with a fixed number of arguments, but I think the variable-number variant should be simple as well. The only issue is that the result associated with the name needs to be marked as a macro rather than a function; no big problem there. Then, in the generic function-evaluation code, I need to add a check for macro-ness, and if it comes out true, do the slightly different processing of passing unevaluated forms to the macro, running that (macro-expansion time), and finally running the result (runtime).

Here is a file containing the above code, plus another pseudo-macro: two versions of a 3-way numeric if.

All in all, I think this might be only one or two dozen lines of code. Check back in a couple of days!

<time passes...>

Holy Y combinator, Batman! Under an hour! The important change turned two lines of code into three, plus fiftyish additional lines of pretty trivial cut'n'paste additions to introduce the new datatype Macro, in all respects analogous to Func, and add the new special form defmacro... ¡Ay Caramba!

Here is another sample file containing actual defmacro versions of the above.

Version bump to 0.7.0; now I need to implement a macro-expand special form, to aid in figuring out what any given macro is doing.


May 29, 2009

Ouch! Long delay since the last update here... what with one thing and another, various important life events, no haskeem hacking in a long time. But I have returned to haskeem and have removed a couple of small spots of bit-rot, plus fixed a small bug I noticed just recently.

The bit-rot had to do with the change in ghc from GNU readline to haskeline: the latest version of ghc doesn't use readline, and that's a bit of functionality I am reluctant to live without. So I've updated haskeem to use haskeline. I did lose one small bit of functionality in that change: interrupting a long calculation once again kills the whole interpreter, rather than getting back to the lisp> prompt. Oops! Well, I'll hopefully get that worked out soon.

For people with older versions of ghc or who don't have haskeline installed, I've left the old version of the main file haskeem.hs, where this stuff was changed, in the tarball.

The bug had to do with scoping and variable bindings: for some reason I don't now remember, I had (define) simply update a variable (ie, like set!) if it was already defined in the environment... well, that kinda messes up scoping. Consider this program:

    (define str "I am top-level")

    (define (foo) (write-string "foo: " str #\newline))
    (define (bar)
      (define str "I am internal")
      (write-string "bar: " str #\newline)
      (foo)
      (write-string "bar: " str #\newline))

    (foo)
    (bar)
    (foo)

That ought to print out three "I am top-level" interspersed with two "I am internal"... instead, it was printing out one "I am top-level" followed by four "I am internal". Fixed now...

In real new stuff, I have been thinking about both continuations and macros. I think macros will be straightforward but a lot of work, and I expect I'll start doing those. Continuations, ie, call/cc, are not so straightforward, but I think I am starting to understand them. I do not now remember where I saw this, but I do know that it was a comment left by Audrey Tang in response to an article on continuations; and it's worth repeating. In quasi-perl, here's a capsule summary of what continuations are:

    $continuation = \&return;

Not so difficult, really. However, I don't think I can implement them in haskeem; I think the flow of control at the haskell level where each expression is evaluated is not accessible enough for me to be able to wrap it up and make it available at the scheme level inside haskeem; or rather, reworking the innards to make the flow of control explicit enough would basically come to a complete rewrite of the whole interpreter. Bummer!


March 5, 2008

Not so much haskell hacking in the last few days. I did finish the nested quasiquote etc stuff. As expected, it was straightforward but a little tedious. Now it returns all the proper results for the example cases from R6RS.

I have tweaked the powerseries examples a little bit. I verified the Bessel functions, added Bessel functions of the second kind In, and simplified the (ps-recip) function: it had previously needed three terms where it ought to have needed only two. After a bit of looking, I found the problem: it wasn't in (ps-recip) at all, but in (ps-mul): I wasn't being lazy enough in that routine.


February 24, 2008

Only a little bit of hacking today: I added a (new-symbol) special form; I'm starting to think about doing macros, and I think that'll be necessary. I avoid name clashes with user-defined symbols by simply generating the internal symbol with a leading space: the rest of the system is perfectly happy with such a symbol, but the parser won't pass the space; so the user can't accidentally generate such a symbol. Clashes between multiple internally-generated symbols are avoided by simply sticking a global counter into the symbol name: every time a new internal symbol is manufactured, the counter gets incremented.

I added a non-gnu (and non-editing) version of readline, for those machines which don't happen to have gnu readline installed. It's commented out inside haskeem.hs.

I think I added the necessary bits to the Bessel function power series in powerseries.scm; at any rate, it correctly computes the first (non-trivial) zero of the first five Bessel functions.


February 23, 2008

Again, not too much hacking of the interpreter; more scheme hacking the past few days... that is a problem with writing an interpreter, you then want to use it! :-) With a few hints from Bulat Ziganshin on the haskell-cafe mailing list, I did add the ability to catch keyboard interrupts; so it's possible to type control-C at a long-running calculation and get back to the lisp> prompt. Previously, that would have just killed the interpreter, which is kinda nasty. Thanks, Bulat!

The scheme hacking consisted mainly of some playing with power series. A few days ago, I was re-reading Higher-Order Perl by Mark Jason Dominus, and I decided to re-implement his section 6.7 on power series. The code is here, if you want to play with it; be aware that the Bessel function series is incomplete. The equivalent of his last example, the head of the power series for tan(x), is

    lisp> (stream-head ps-tan 10)
    (0 1 0 1/3 0 2/15 0 17/315 0 62/2835)

February 20, 2008

Not much hacking of the interpreter, just adding several functions to the library: random-number generation, plus networking: I can now write a TCP client and server in haskeem. Here's a trivial server:

    (define listener (listen-on 2500))
    (write-string stderr "listening on port 2500\n")
    (let ((remote #f)
          (host "")
          (port -1)
          (ret '()))
      (do ((i 1 (+ i 1)))
          ((> i 10) (close-port listener))
        (set! ret (accept listener))
        (set! remote (car ret))
        (set! host (cadr ret))
        (set! port (caddr ret))
        (write-string stderr "got connection from " host #\linefeed)
        (write-string remote "hello, " host ", how are you?\n")
        (write-string remote "The current local time is " (localtime) #\linefeed)
        (close-port remote)))

And a trivial client to go with it:

    (define host "localhost")
    (define port 2500)
    (when (positive? (length args))
          (set! host (car args))
          (set! args (cdr args)))
    (when (positive? (length args))
          (set! port (string->number (car args)))
          (set! args (cdr args)))
    (define remote (connect-to host port))
    (do ((spew (string-join-by " "
                           "Connecting to host" host
                           "port" (number->string port))))
        ((not spew) (write-string "all done!\n") (close-port remote))
      (write-string spew #\linefeed)
      (set! spew (read-line remote)))

A super-high-performance web server is probably not in the cards, but I think I could probably build a small web spider on top of this functionality... :-)


February 17, 2008

The quasiquoting unquoting operators are mostly working. I have implemented the unwrapping I had left yesterday, and haskeem is passing most of the test cases shown in R6RS. There are two areas where it is not yet working: neither nested quasiquotes nor the caboose of a dotted-list are properly handled yet. I think both of these are straightforward but a bit involved: doing nested quasiquotes is simply a matter of keeping track of the nesting, but that involves threading a nesting count through the whole machinery of the evaluator, and that touches a fair amount of code. Doing dotted-lists right is, I think, analogous to properly handling them in the parser, and I've already got that working. It's made a little tricky by JTang's original design decision, which I adopted as well, of using Haskell lists as the basic unit of list structure:

    (a . (b . (c . (d . ()))))

is exactly and explicitly equivalent, according to R6RS, to

    (a b c d)

as are all of these (plus a couple more):

    (a b . (c d))
    (a b . (c . (d . ())))
    (a b c . (d . ()))

and that equivalence is not entirely natural when using Haskell lists (indeed, the parser presented in JTang's tutorial doesn't handle this correctly). So there is a bit of work required to correctly handle dotted-lists.

time passes...

The dotted-list problem is solved: indeed, it turned out not to be analogous to the stuff in the parser, it turned out to be the parser. If you examine the examples above, you'll see there is a simple rule for how to turn dotted-lists into lists: if the thing in the caboose position is itself a proper list, then the overall dotted-list object is actually a proper list. That rule, applied repeatedly, is sufficient to turn all of the above examples into the "canonical" list (a b c d), and there is (was) code in the parser to do just that.

However... in the presence of unquote, that comes out wrong. The test case from R6RS which checks this is

    `((foo ,(- 10 3)) ,@(cdr '(c)) . ,(car '(cons)))

which turns into

    (quasiquote ((foo (unquote (- 10 3))) (unquote-splicing (cdr (quote (c)))) . (unquote (car (quote (cons))))))

Examine the caboose: it's (unquote (car (quote (cons))))... which is a two-element proper list; so the parser promotes the overall expression to a proper list, the unquote is in the middle of the list by itself, and then it all falls apart in the evaluator.

So I have added a bit of special-purpose pattern matching to the code in the parser that checks for lists in the caboose: if it's a list, but the first item in the list is a symbol, and that symbol is either unquote or unquote-splicing, then the promotion gets skipped. And then the test succeeds!


February 16, 2008

I've added support for vectors: #(...). At first I was going to use the Haskell Data.Array datatype, but documentation for that is somewhere between nonexistent and crappy, and what little I could make out from the sources seemed to suggest that this was more for native datatypes: the equivalent of C's int, double, etc. My LispVal doesn't seem to fit in there. Fortunately, the Data.IntMap datatype seemed very well suited to what I wanted, so that was the route I chose. The R6RS merely says that

Vectors are heterogeneous structures whose elements are indexed by integers. A vector typically occupies less space than a list of the same length, and the average time needed to access a randomly chosen element is typically less for the vector than for the list.

which fits very well with Data.IntMap: that is implemented using PATRICIA trees, and many of the operations take O(min(n,W)) time, where n is the length of the vector and W is the word size of the computer: 32 or 64. I don't know what the constant out front is, and I don't care too much about that; for large vectors, it's bound to be better than O(n) for a list.

I have also begun to implement more of the quoting constructs: (quote foo) was already present; now I'm adding (quasiquote foo), (unquote foo), and (unquote-splicing foo) -- so far, only the easy bits: the abbreviations

’<datum> for (quote <datum>)
‘<datum> for (quasiquote <datum>)
,<datum> for (unquote <datum>)
,@<datum> for (unquote-splicing <datum>)

At least with the browser I'm using right now, the quote characters are very hard to distinguish: ’, the character used in the abbreviation for (quote <datum>), is the right single quotation mark, located on the right of my keyboard, and ‘, the character used in the abbreviation for (quasiquote <datum>), is the left single quotation mark, located on the left of my keyboard. Actually, the keyboard characters should probably be called apostrophe and backtick, respectively.

time passes...

These are almost working: there is still one extra level of parentheses in the result, for example ‘(1 2 ,(+ 3 4)) should return (1 2 7), but actually returns (1 2 (7)). I think I understand why, and what to do about it... but that's for tomorrow.


February 11, 2008

I've fixed the glitch in writing numbers in scientific notation; that was pretty straightforward, although I don't know how efficient my solution is. But efficiency isn't the primary target here...

I've also added a rather specific new type of comment to the parser: it now understands and ignores "#!" lines at the top. So I can have actual directly-executable scheme scripts:

    % ls -lF try
    -rwxr-xr-x 1 uwe users 134 Feb 11 17:05 try*

    % cat try
    #!/home/uwe/tools/haskeem
    (set! args (cons "It's alive! It's ALIIIIIVVVE!" args))
    (write-string (string-join-by " " args) #\linefeed)

    % ./try
    It's alive! It's ALIIIIIVVVE!

    % ./try meow
    It's alive! It's ALIIIIIVVVE! meow

So now, if I need to, I can compute symbolic derivatives in a cron job :-)


February 10, 2008

I have just received an object lesson in just how lazy Haskell is... I was working on implementing printing numbers in scientific notation, n.nnEnn, and when I thought I had that working, I started doing a couple of tests in haskeem. I entered a couple of numbers which worked fine, and then I entered 145e-2 which promptly crashed the interpreter. I looked at the write-number routine and couldn't find anything, so I started sneaking up on the problem: I entered the number in such a way that I wasn't going to print its value at all (define returns the value it assigned to the variable, and the REPL prints that, so that wouldn't work; I had to write (begin (define var 145e-2) #t) to get the value into the interpreter without printing it), then I started querying it: is it an integer? no; is it a rational? no; is it a real? yes; is it infinite? and the interpreter crashed again. After a little while, I tracked the problem to the parser: I allow the parser to return numbers of the form aaaEbb as integers, if there is no decimal point in the "aaa" part and the "bb" part is non-negative: ie, 123e4 is the integer number 1230000. It turned out that I was insufficiently careful in how I had implemented that, and this particular number 145e-2 satisfied the first part of the test, no decimal point in the "aaa" portion, but not the second part, the "bb" portion is non-negative. So the parse wasn't fully evaluated until well after the interpreter had returned from the parser and had already processed several other commands. In fact, the "is it infinite?" check was the first one where the actual value mattered, as opposed to just the format. Wow! I knew Haskell is lazy, but only in the front of my mind, it seems; my reptilian brainstem was still thinking "oh, it's returned from the parse, so it's finished with that" (well, to the extent that my reptilian brainstem thinks about programming at all... :-) )

That's all fixed now, and scientific notation works, with still one small glitch: if a number has more nonzero digits in its tail than the specified precision, rounding occurs, and that rounding can of course propagate up toward the head: so for example 0.9999 gets printed as 1.00 with two digits after the decimal point. If that happens when formatting a number in scientific notation, 0.9999 gets formatted as 9.999e-1 when enough digits of precision are available, but as 10.00e-1 rather than as 1.00e0 when rounding occurs. The value is correct, just the format is slightly off. This is purely my bug, nothing to do with GHC's native number-printing routines at all, and I understand where and why it's happening, I just haven't fixed it yet.


February 9, 2008

I'm in the process of adding an optional precision specifier to (number->string): it does the equivalent of C's %.Nf now, for integers and rationals; I'm still working on the same for floating-point numbers, and I haven't started %.Ne at all. For integers and floating-point numbers, this isn't so hugely necessary (although it's still pretty useful); but if you're doing some big calculation involving rational numbers, say computing the sum of the reciprocals of the first hundred primes, it's very much nicer to know that that result is about 2.106342121472621094 than that it is the ratio of two several-hundred-digit numbers:

9924938317306578104794915348371074960828795860429929737290618944808506634743370796528436363734050330051393733303456260639596308477098909255081265596725817362940507726427713984550226766935610334519674156561744081975868623/
4711930799906184953162487834760260422020574773409675520188634839616415335845034221205289256705544681972439104097777157991804380284218315038719444943990492579030720635990538452312528339864352999310398481791730017201031090

I dunno about you, gentle reader, but that makes my eyeballs bleed...

To compute that, I actually used the exception-handling stuff in kind of a neat fashion. Here's a bit of code: first define an initial table of the lowest couple of primes; we have to have 3 in there, because 2 is kind of an odd prime...

    (define primes '(2 3))

Then define a routine to check for primality. This is the part I think is cool: I could have a moderately complicated test to determine if I can exit the loop, and then another bit of code to determine what result the routine should return, but using a guard to catch some exceptions, and then within the loop just raise an exception with the return value of the test ends up being cleaner; and it doesn't interfere at all with the actual exception of running off the end of the table: that one isn't caught by the guard.

    (define (is-prime? n)
      (guard (err ((boolean? err) err))
             (do ((pl primes (cdr pl))
                  (cur 0))
                 ((null? pl) (raise "exhausted primes list!"))
               (set! cur (car pl))
               (if (= cur n) (raise #t))
               (if (zero? (remainder n cur)) (raise #f))
               (if (> (* cur cur) n) (raise #t)))))

Then this next routine computes the next prime past the end of the table and adds it to the table, until the size of the primes table is at least n.

    (define (grow-primes-table n)
      (do ((j (length primes) (+ j 1)))
          ((>= (length primes) n) primes)
        (do ((i (+ 2 (last primes)) (+ 2 i)))
            ((is-prime? i) (set! primes (append primes (list i)))))))

And finally this is the routine that computes the sum of the reciprocals of the first N primes. It makes use of the fact that the grow-primes-table routine above returns the list of primes after it has finished growing it.

    (define (sum-recprimes n)
      (apply + (map / (list-head (grow-primes-table n) n))))

If you invoke this with

    (number->string (sum-recprimes 100))

you get the eyeball-bleed-inducer above, whereas if you invoke it with

    (number->string (sum-recprimes 100) 10 18)

you get the much nicer 2.106...


February 2, 2008

Major reworking of error handling. I figured out a couple of problems with the error handling, in particular error messages generated in non-interactive mode are now printed. I had that very wrong before: I was using runIOThrows, a wrapper around catchError, inside Control.Exception.try. In hindsight, it's perfectly obvious that if the inner catcher is handling all the errors, then the outer catcher won't have much of anything to do... d'oh! <palm to forehead>

This does not yet address the exception-handling at the scheme level, but I think I understand Haskell's error handling much better now, so hopefully I'll get the scheme-level exception handling working now, too.

time passes...

Yes! new special form guard, with syntax as described in R6RS. Check this out: first define a small function...

    ; the first 'test', "(begin...)", is only to print out the value of
    ; the exception that was thrown: that 'test' always returns #f
    
    (define (tryit test)
      (guard
       (err ((begin (write-string "exception is '")
                    (display err)
                    (write-string "'\n")
                    #f) #t)
            ((and (string? err) (string=? err "meh"))
             (write-string "caught 'meh'\n")
             "I am the 'meh'-catcher")
            ((and (string? err) (string=? err "barf"))
             (write-string "caught 'barf'... yuk!\n")
             -42)
            ((and (string? err) (string=? err "yahoo!"))
             (write-string "a little excitable today, aren't we...\n")
             "US$44.6billion")
            ((eqv? err '(1 2))
             (write-string "does not compute\n")
             "daleks rule! (until their heads blow off)"))
       (raise test)))

and then run some tests with it:

    lisp> (load "guard.scm")
    (lambda (test) ...)
    lisp> (tryit "meh")
    exception is '"meh"'
    caught 'meh'
    "I am the 'meh'-catcher"                        <-- return value
    lisp> (tryit "barf")
    exception is '"barf"'
    caught 'barf'... yuk!
    -42                                             <-- return value
    lisp> (tryit "yahoo!")
    exception is '"yahoo!"'
    a little excitable today, aren't we...
    "US$44.6billion"                                <-- return value
    lisp> (tryit '(1 2))
    exception is '(1 2)'
    does not compute
    "daleks rule! (until their heads blow off)"     <-- return value
    lisp> (tryit '(1 3))
    exception is '(1 3)'
    user exception (1 3)                            <-- NOT caught! this got printed by
                                                           the outermost level, the 'P' in 'REPL'

Notice that we can throw arbitrary lisp objects as exceptions. W00t! Izzat cool, or what? This mechanism will also catch system-level exceptions, ie, "file not found" kinds of errors; however, those will just show up as string values, like "nil.dat: openFile: does not exist (No such file or directory)". So processing those is a little messier, although still doable.


February 1, 2008

Not so much Haskell hacking, but a few small tweaks in stdlib.scm, plus one very nice usability feature: connect the REPL with gnu readline. That makes it very much nicer to use interactively! I'm hereby declaring that it's no longer (almost) Scheme in Haskell, it's just Scheme in Haskell.


January 28, 2008

I have finished the do syntactic form, although it's not yet in the self-tests (by now 800+); but a small separate script using it gives the right results. The exception-handling stuff, which I thought would be straightforward, is proving less so: I apparently don't understand enough Haskell yet, and I am getting various type-mismatch error messages which I haven't yet managed to untangle. Oh well...


January 26, 2008

I have added a bunch of functions for manipulating files and directories, reading the environment, and so on, plus miscellaneous cleanups; for example, it was giving a parser error message if the user just entered a blank line at the prompt. That was correct, as far as the parser goes -- it expects to see at least one scheme expression or object -- but kinda unfriendly. Also, a special form to display the currently-active bindings: dump-bindings.


January 22, 2008

I have added delay and force, which allow the definition of lazily-evaluated infinite datastructures such as streams. For example, define the natural numbers as follows:

    (define natural-numbers
      (letrec ((next
                (lambda (n)
                  (cons n (delay (next (+ n 1)))))))
        (next 1)))

ie, the integer 1 plus a promise to evaluate the rest of the natural numbers if it should become necessary; then it's possible to do

    (define (stream-head strm n)
      (if (<= n 0)
          '()
          (cons (car strm) (stream-head (force (cdr strm)) (- n 1)))))

    (stream-head natural-numbers 20)

and get

    (1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20)

Very cool!

On one level, this was pretty trivial: add a new Delay constructor to the LispVal stuff in lispdata.hs, then in evaluator.hs add code to just return a Delay object when a (delay foo) special form is seen, plus code to actually evaluate what was stored if a (force bar) special form is seen. However, that by itself fails one of the tests in R6RS: with just that, it's possible to make two-timing promises that evaluate to different results depending on when they are forced; that's not so nice!

The solution, adding a variable into which to store the computed value and just return that from then on, was straightforward, too, except for making the variable name. It doesn't really matter what that name is, as long as it doesn't clash with anything else, but that "doesn't clash with anything else" was tricky. I tried for a while to use the State monad, but I couldn't get that to work. Eventually, I decided to just stick stuff into the same environment that's passed along when the Delay object was created, using a global variable defined at startup in the top-level environment. That worked fine.


January 19, 2008

I've been interested in the Haskell programming language for a while, but was never really able to get into it. I wrote a couple of tiny toy programs, but that was about it, until I recently came across Jonathan Tang's most excellent Haskell tutorial Write Yourself a Scheme in 48 Hours. I decided that that looked like a lot of fun, so I started on it. By now, I've spent a fair bit more than 48 hours on it, but I've also gone a ways beyond what JTang described. I started during the christmas 2007 holiday, and have, as of late January 2008, spent the idle hours of a couple of weeks at it. It's starting to be a neat little toy, so I'm putting it onto the web here for others to play with or possibly learn from.

Before I go any farther, a couple of points:

OK. With that out of the way, there are a couple of points where I am actually and intentionally critical of what JTang wrote, and where I think he should have done better:

The first is an actual programming issue: the parser JTang described, and the one you will write, should support some kind of comments. R6RS (the Revised6 Report on Scheme) describes a couple of different kinds, of which I have so far implemented only the "semicolon to line-end" flavor, but it is essential, once you write anything longer than a line or two, to be able to comment it. Perhaps you'll only ever write a single comment that says Look and marvel, world, at the brilliance of my lisp koan!... that's ok; but maybe you'll also write things like OMG! WTF is this stupid interpreter doing?! Fix this!!! and then later on you'll be able to find it again and fix it. Whatever. Comments are essential, so when you go through JTang's tutorial, pretend that he talked about comments, and add the ability to comment stuff into the parser early on. It's not hard, and you'll be grateful for it soon enough.

The second issue is less of a programming issue and more being systematic. However much of a toy this is, it is an actual interpreter for an actual language, and you have to test it as you go, but you also have to check that your new hacks don't break anything old. So keep track of small tests, collect them into one large file, and once your interpreter is able to execute a file, set things up to run that self-test; then run it often. This is something that I think JTang should have emphasized more. By now I'm up to 400 tests in the self-test for haskeem. These aren't all totally distinct, a bunch of them are exploring corner cases of simple math functions, but not only does that give me some confidence that it's doing the arithmetic right, it also caused me to find a couple of small bugs in GHC, the Glasgow Haskell Compiler, which I have duly reported. (GHC doesn't handle NaNs quite correctly in all cases, for example 1.0 ** NaN should be NaN but is evaluated as 1.0.)

All right, here's where you should stop if you want to do it yourself. Go write yourself a scheme in 48 hours, and come back when you're done with it!


So what's haskeem? It's (mostly) a sub-set of R6RS Scheme; while writing it, I did retrieve the R6RS, and I went through it, trying to find and eliminate differences. To the extent possible, I did and do want it to be compatible, although I did do a couple of small things differently; I'll describe them more below.

JTang's tutorial implements an interpreter with integer-only arithmetic and only base-10 input and output of numbers, whereas scheme supports integers, rational numbers, and real and complex floating-point numbers; all of these can furthermore come in several different precisions, plus they can be exact or inexact, and the integer quantities can be written in base-2, base-8, base-10, and base-16. I have not implemented all of this! It's important for writing large robust programs; it's not essential for writing a small toy. But I have implemented parts of it: I began by adding support for real floating-point numbers, since I wanted to add a bunch of math functions, and having, for example, (sin x) always return only one of -1, 0, or +1 just doesn't quite cut it; the numerical analyst cringes!

This is the point where I made my one major excursion from R6RS Scheme syntax: that document describes and allows only base-10 floating-point numbers, but when I was adding floating-point numbers to the parser, I found that I was re-using enough pieces of the integer-parsing code, which supports multiple bases, that it would be pretty simple to add support for floating-point numbers in the other bases. So I did. So it's possible to specify the floating-point number #xdeadbeef.c0ffee in haskeem (it's 3.7359285597539053e9), or to see that pi in base-16 is about #x3.243f6a8885a3.

I have also mostly finished adding support for rational numbers. There are some small differences in the way I treat them versus how R6RS treats them. For finite rationals, of course, it's all the same, but if I understand the Report correctly, they are disallowing rational infinity and NaN: +1/0, -1/0, and 0/0, whereas I think these are useful quantities for the same reasons that floating-point infinities and NaNs are useful, so I allowed them and have attempted to deal properly with them in haskeem. That was actually a bit tricky, since Haskell's native rational numbers don't play well with these quantities. Haskell will recognize them and can read and write them, but doing arithmetic with them leads to runtime exceptions, so I had to add a fair bit of special-case code to deal with all the possibilities..

That's the next major area where I added stuff to what JTang showed. Try typing this at his interpreter: (/ 1 0). If it does the same thing on your computer as on mine, it'll give you a "division by zero" runtime exception and dump you back to your shell prompt. Not so good! Another area where I found the same thing was when implementing some of the file-IO stuff. It's pretty user-hostile to signal end-of-file with an exception that kills the interpreter, so adding exception-handling was pretty important. I'm not sure I have this entirely right: although I've wrapped the entire main REPL in a Haskell try, it is still not catching division by zero. (No, Haskell experts, it's not that I'm using try from the prelude, it's Control.Exception.try, and it's not catching divide by zero; I (sic) tried, although in a small test program CE.try does catch divide by zero.) However, haskeem as a whole is catching divide by zero, I just used a different method to do it.

I have also added most of the core R6RS syntactic forms: I've got and, apply, begin, case, cond, define, if, lambda, let, let*, letrec, letrec*, or, quote, and set!. (Incidentally, JTang implemented and and or as functions wrapped around && and ||; this is not quite right, because and and or are supposed to short-circuit as soon as the value of the expression can be determined.) Still missing are let-values, let*-values, do, when and until (although these last two are pretty trivial), and set-car! and set-cdr!; these last two seem to be regarded with some mixed feelings by the R6RS, and I'm not at all sure I'll ever put them into haskeem. In part this is due to a design decision by JTang; more on this in a bit. Also missing is everything related to quasiquote.

I have a mutant version of eval, which actually evaluates an expression in the current function context rather than the top-level context. It didn't occur to me to make it evaluate in the top-level context, that's why I made this a special syntactic form rather than a regular library function. In order to make this work with things like (map eval some-list), I added a function to the standard library: (define (eval x) (eval x)). (If you look, you'll see one of those comments I mentioned next to this definition. :-) )

More related to the actual running of the interpreter, and less related to the scheme language, is the load special form. I am not exactly sure now why I made it a special syntactic form rather than a library function (I should have added a comment there, too!), but it is. Perhaps it was for easy access to the environment; I haven't added any facility for getting the execution environment inside library functions.

Another special form that's related to the specifics of the interpreter is trace, a very modest facility for observing how the interpreter is doing stuff. If you examine the file fibo.scm, you'll see it in action; it produces output like this:

    lisp> (load "fibo.scm")
    trace bad-fibo on
    #t
    lisp> (fibo 6)
    trace fibo-acc on
    trace: fibo-acc <- (6 1 0)
    trace: fibo-acc <- (5 1 1)
    trace: fibo-acc <- (4 2 1)
    trace: fibo-acc <- (3 3 2)
    trace: fibo-acc <- (2 5 3)
    trace: fibo-acc <- (1 8 5)
    8
    lisp> (bad-fibo 6)
    trace: bad-fibo <- (6)
    trace: bad-fibo <- (5)
    trace: bad-fibo <- (4)
    trace: bad-fibo <- (3)
    trace: bad-fibo <- (2)
    trace: bad-fibo <- (1)
    trace: bad-fibo <- (0)
    trace: bad-fibo <- (1)
    trace: bad-fibo <- (2)
    trace: bad-fibo <- (1)
    trace: bad-fibo <- (0)
    trace: bad-fibo <- (3)
    trace: bad-fibo <- (2)
    trace: bad-fibo <- (1)
    trace: bad-fibo <- (0)
    trace: bad-fibo <- (1)
    trace: bad-fibo <- (4)
    trace: bad-fibo <- (3)
    trace: bad-fibo <- (2)
    trace: bad-fibo <- (1)
    trace: bad-fibo <- (0)
    trace: bad-fibo <- (1)
    trace: bad-fibo <- (2)
    trace: bad-fibo <- (1)
    trace: bad-fibo <- (0)
    8

The stuff in parentheses is the argument list being handed to each invocation of each traced function (there can be any number being traced simultaneously). This doesn't yet show indentation to represent who's calling whom, but it is nonetheless already very useful in seeing what's going on. This example very clearly shows the difference in recursive calling between the two versions of the fibonacci calculation. I intend to use haskeem to work my way through as much of Structure and Interpretation of Computer Programs as I can, and I expect that this will help a lot.

So what's missing, what's next? I plan to improve this in a couple more areas, as I have time and ability. I think these are more or less listed in the order in which I'll attack them, and also in order of increasing difficulty:

I'm not so sure that I will implement the rest of the Scheme number tower. Doing complex numbers right is a lot of work that's not really interesting, and I'm not so sure I'll do anything with haskeem that would require it; for those occasions where I would need complex numbers, I can probably do it explicitly as pairs of reals.

If you have questions or comments about the stuff here, please contact me at korg@korgwal.com.

Enjoy! -- Uwe Hollerbach

Page last modified at 2009/06/28 21:30 US/Pacific

Valid HTML 4.01 Transitional