Saturday, June 10, 2017

Solving the JavaScript Problem

The JavaScript Problem is a good problem to have. Against the odds, “write once, run anywhere” is a reality for web browsers because of a language governed by a standards organization. Not so long ago, proprietary technologies such as Flash and Silverlight threatened the openness of the web.

So we should be grateful the JavaScript problem is merely a technical one, namely that JavaScript is poorly designed. Though the language has improved over the years, its numerous flaws are too deeply entrenched to remove. Transpilers help by unlocking access to better languages, but JavaScript was never intended to be an assembly language. It’s only thanks to heroic efforts of many talented engineers that JavaScript has gone so far.

Ideally, the lingua franca of the web should be low-level, clean, and simple, so we can develop in any language with little overhead.


Recent versions of several popular browsers have fulfilled our wishes with WebAssembly, also known as wasm. WebAssembly is an open standard, and well-designed. At last, works such as Benjamin Pierce’s “Types and Programming Languages” are mainstream enough that WebAssembly has formally specified reduction and typing rules, and even a proof of soundness. In contrast, weakly typed languages such as JavaScript ignore a century or so of mathematical progress.

In WebAssembly, nondeterminstic behvaviour can only arise from exhaustion, external host functions, and the IEEE-754 floating-point standard, which fails to specify the NaN bit pattern for all cases. Recall in C and the many languages built upon it such as Go and Haskell, signed integer overflow causes undefined behaviour. WebAssembly fixes this by stipulating two’s complement for negative numbers, as competing representations of negative numbers are ultimately responsible for this defect of C. Endianness is similarly settled, though curiously by travelling the road not taken by network byte order: numbers in WebAssembly are encoded in little-endian.

The WebAssembly virtual machine is stack-based. Years ago, I read that register-based virtual machines are faster, but perhaps these results are now obsolete. Browsing briefly, I found newer papers:

It’s the same old story. Register-based virtual machines are still faster after all. It seems WebAssembly prioritizes code size, and trusts browsers will ship with good JIT compilers.

Toy Compilers

Online demonstrations of WebAssembly compilers are fun to build, and fun to describe: I compiled a Haskell program to JavaScript that when executed, reads a Haskell-like program and compiles it to WebAssembly, which some JavaScript loads and executes.

Perhaps it’s easier to invite the reader to:

For the last 2 programs, I transformed the source to a tree of S and K combinators, so apart from graph reduction, I only had to code 2 combinators in assembly. The resulting binaries are excruciatingly slow, especially since numbers are Church-encoded, but it all seems to work.

I look forward to a Haskell compiler that produces efficient WebAssembly, though it may have to wait until WebAssembly gains a few more features, such as threads and tail calls.

Sunday, April 2, 2017

Lambda Calculus Surprises

Much time has passed since my last entry. I’ve been frantically filling gaps in my education, so I’ve had little to say here. My notes are better off on my homepage, where I can better organize them, and incorporate interactive demos.

However, I want to draw attention to delightful surprises that seem unfairly obscure.

1. Succinct Turing-complete self-interpreters

John McCarthy’s classic paper showed how to write a Lisp interpreter in Lisp itself. By adding a handful of primitives (quote, atom, eq, car, cdr, cons, cond) to lambda calculus, we get a Turing-complete language where a self-interpreter is easy to write and understand. For contrast, see Turing’s universal machine of 1936.

Researchers have learned more about lambda calculus since 1960, but many resources seem stuck in the past. Writing a Turing-complete interpreter in 7 lines is ostensibly still a big deal. The Roots of Lisp by Paul Graham praises McCarthy’s self-interpreter but explores no further. The Limits of Mathematics by Gregory Chaitin chooses Lisp over plain lambda calculus for dubious reasons. Perhaps McCarthy’s work is so life-changing that some find it hard to notice new advances.


(I’ve suppressed the lambdas. Exercise: write a regex substitution that restores them.)

In fact, under some definitions, the program “λq.q(λx.x)” is a self-interpreter.

2. Hindley-Milner sort

Types and Programming Languages (TaPL) by Benjamin C. Pierce is a gripping action thriller. Types are the heroes, and we follow their epic struggle against the most ancient and powerful foes of computer science and mathematics.

When we first meet them, types are humble guardians of a barebones language that can only express the simplest of computations involving booleans and natural numbers. As the story progresses, types gain additional abilities, enabling them to protect more powerful languages.

However, there seems to be a plot hole when types level up from Hindley-Milner to System F. As a “nice demonstration of the expressive power of pure System F”, the book mentions a program that can sort lists.

The details are left as an exercise to the reader. Working through them, we realize a Hindley-Milner type system is already powerful enough to sort lists. Moreover, the details are far more pleasant in Hindley-Milner because we avoid the ubiquitous type spam of System F.

System F is indeed more powerful than Hindley-Milner and deserves admiration, but because of well-typed self-application and polymorphic identity functions, existential types, and other gems; not because lists can be sorted.

3. Self-interpreters for total languages

They said it couldn’t be done.

According to Breaking Through the Normalization Barrier: A Self-Interpreter for F-omega by Matt Brown and Jens Palsberg, “several books, papers, and web pages” assert self-interpreters for a strongly normalizing lambda calculus are impossible. The paper then shows that reports of their non-existence have been greatly exaggerated.

Indeed, famed researcher Robert Harper writes on his blog that “one limitation of total programming languages is that they are not universal: you cannot write an interpreter for T within T (see Chapter 9 of PFPL for a proof).”, and as of now (April 2017), the Wikipedia article they cite still declares “it is impossible to define a self-interpreter in any of the calculi cited above”, referring to simply typed lambda calculus, System F, and the calculus of constructions.

I was shocked. Surely academics are proficient with diagonalization by now? Did they all overlook a hole in their proofs?

More shocking is the stark simplicity of what Brown and Palsberg call a shallow self-interpreter for System F and System Fω, which is essentially a typed version of “λq.q(λx.x)”.

It relies on a liberal definition of representation (we only require an injective map from legal terms to normal forms) and self-interpretation (mapping a representation of a term to its value) which is nonetheless still strong enough to upend conventional wisdom.

Which brings us to the most shocking revelation: there is no official agreement on the definition of representation or self-interpretation, or even what we should name these concepts.

Does this mean I should be wary of even the latest textbooks? Part of me hopes not, because I want to avoid learning falsehoods, but another part of me hopes so, for it means I’ve reached the cutting edge of research.

See for yourself!

Interactive demos of the above:

Tuesday, November 10, 2015

Neural Networks in Haskell

Long ago, when I first looked into machine learning, neural networks didn’t stand out of the crowd. They seemed on par with decision trees, genetic algorithms, genetic programming, and a host of other techniques. I wound up dabbling in genetic programming because it seemed coolest.

Neural networks have since distinguished themselves. Lately, they seem responsible for each newsworthy machine learning achievement I hear about. To name a few:

Inspired, I began reading Michael Nielsen’s online book on neural networks. We can whip up a neural network without straying beyond a Haskell base install, though we do have to implement the Box-Muller transform ourselves to avoid pulling in a library to sample from a normal distribution.

The following generates a neural network with 3 inputs, a hidden layer of 4 neurons, and 2 output neurons, and feeds it the inputs [0.1, 0.2, 0.3].

import Control.Monad
import Data.Functor
import Data.List
import System.Random

main = newBrain [3, 4, 2] >>= print . feed [0.1, 0.2, 0.3]

newBrain szs@(_:ts) = zip (flip replicate 1 <$> ts) <$>
  zipWithM (\m n -> replicateM n $ replicateM m $ gauss 0.01) szs ts

feed = foldl' (((max 0 <$>) . ) . zLayer)

zLayer as (bs, wvs) = zipWith (+) bs $ sum . zipWith (*) as <$> wvs

gauss :: Float -> IO Float
gauss stdev = do
  x <- randomIO
  y <- randomIO
  return $ stdev * sqrt (-2 * log x) * cos (2 * pi * y)

The tough part is training the network. The sane choice is to use a library to help with the matrix and vector operations involved in backpropagation by gradient descent, but where’s the fun in that?

It turns out even if we stay within core Haskell, we only need a few more lines, albeit some hairy ones:

relu = max 0
relu' x | x < 0      = 0
        | otherwise  = 1

revaz xs = foldl' (\(avs@(av:_), zs) (bs, wms) -> let
  zs' = zLayer av (bs, wms) in ((relu <$> zs'):avs, zs':zs)) ([xs], [])

dCost a y | y == 1 && a >= y = 0
          | otherwise        = a - y

deltas xv yv layers = let
  (avs@(av:_), zv:zvs) = revaz xv layers
  delta0 = zipWith (*) (zipWith dCost av yv) (relu' <$> zv)
  in (reverse avs, f (transpose . snd <$> reverse layers) zvs [delta0])
    f _ [] dvs = dvs
    f (wm:wms) (zv:zvs) dvs@(dv:_) = f wms zvs $ (:dvs) $
      zipWith (*) [sum $ zipWith (*) row dv | row <- wm] (relu' <$> zv)

descend av dv = zipWith (-) av ((0.002 *) <$> dv)

learn xv yv layers = let (avs, dvs) = deltas xv yv layers
  in zip (zipWith descend (fst <$> layers) dvs) $
    zipWith3 (\wvs av dv -> zipWith (\wv d -> descend wv ((d*) <$> av))
      wvs dv) (snd <$> layers) avs dvs

See my Haskell notes for details. In short: ReLU activation function; online learning with a rate of 0.002; an ad hoc cost function that felt right at the time.

Despite cutting many corners, after a few runs, I obtained a neural network that correctly classifies 9202 of 10000 handwritten digits in the MNIST test set in just one pass over the training set.

I found this result surprisingly good. Yet there is much more to explore: top on my must-see list are deep learning (also described in Nielsen’s book) and long short-term memory.

I turned the neural net into an online digit recognition demo: you can draw on the canvas and see how it affects the outputs.

Monday, February 23, 2015

Mighty Warp

  • Outperforms nginx.

  • Under 1300 lines of source.

  • Clear control flow: handles one request per thread using blocking calls.

  • Slowloris DoS protection.

The secret is GHC’s runtime system (RTS). Every Haskell program must spend time in the RTS, and maybe this does hurt performance in certain cases, but for web servers it is a huge win: the RTS automatically transforms code that seems to handle one request per thread into a server with multiple event-driven processes. This saves many a context switch while keeping the source simple.

Best of all, this magic technology is widely available. To start a webserver on port 3000 using the Warp library, run these commands on Ubuntu [original inspiration]:

sudo apt-get install cabal-install
cabal update
cabal install warp
cat > server.hs << EOF
#!/usr/bin/env runghc
{-# LANGUAGE OverloadedStrings #-}
import Network.Wai (responseLBS)
import Network.Wai.Handler.Warp (run)
import Network.HTTP.Types (status200)
import Network.HTTP.Types.Header (hContentType)

main = run 3000 $ \_ f -> f $ responseLBS
  status200 [(hContentType, "text/plain")] "Hello, world!\n"
chmod +x server.hs

Eliminating context switches is the best part of the story, but there’s more. Copying data can be avoided with a simple but clever trick the authors call splicing. Using conduits instead of lazy I/O solves the non-deterministic resource finalization problem. And a few judiciously placed lockless atomic operations can work wonders: in particular, for basic Slowloris protection and for a robust file descriptor cache.


Those who fear straying too far from a C-like language can still reap the benefits in Go:

  • Goroutines are like green threads.

  • Channels are like conduits.

  • Array slices are like splices.

If I didn’t know better, I would say the designers of Go emulated the inventor of the California roll: they took some of the best features of languages like Haskell and made them palatable to a wider audience.

I wonder how Go’s RTS compares. One innate advantage GHC may have is Haskell’s type system, which leads to largely non-destructive computation, which ultimately leads to a cheap and effective scheduling scheme (namely, context switching on memory allocation). Still, I expect a well-written Go web server could achieve similar results.

Saturday, December 20, 2014

Haskell for programming contests

Farewell, Dr. Dobb’s. In a way, my previous post proved prescient: in the old days, I relied on printed magazines like Dr. Dobb’s Journal to learn about computers. Now, most things are but a few search terms away. Coding is easier than ever.

Though I have a soft spot for this particular magazine, ultimately I’m glad information has become more organized and accessible. I like to think I played a part in this, however small, by posting my own tutorials, articles, rants, and code.

Here’s hoping the remainder of my previous post also ages well. That is, may Haskell live long and prosper. [A sentiment echoed by Dr. Dobb’s.] Again, I’d like to play a small part in this: Haskell for programming contests.

Tuesday, August 12, 2014

Let's Code!

A recent article in Dr. Dobb’s Journal bemoans the complexity of today’s development toolchains: “it’s hard to get any real programming done”. However, my own experience suggests the opposite: I find programming is now easier than ever, partly due to better tools.

I say “partly” because when I was a kid, it was difficult to obtain code, compilers, and documentation, let alone luxuries like an SCM. I scoured public libraries for books on programming and checked out what they had, which meant I studied languages which I could never use because I lacked the right compiler, or even the right computer. I nagged my parents to buy me expensive books, and occasionally they’d succumb. Perhaps the most cost-efficient were magazines containing program listings which of course had to be keyed in by hand. (One of my most treasured was an issue of Dr. Dobb’s Journal, back when it was in print, and only in print.)

Nowadays, a kid can get free high-quality compilers, code, tutorials, and more at the click of a button. But I believe even without this freer flow of information, programming would still be easier than ever because our tools have improved greatly.

got git?

The author singles out Git as a source of trouble, but the reasoning is suspect. For example, we’re told that with respect to other “SCMs you’ve used…Git almost certainly does those same actions differently.”

This suggests that the author used other SCMs, then tried Git and found it confusing. In contrast, I used Git, then tried other SCMs and found them confusing. I predict as time passes, more and more developers will learn Git first, and their opinions of SCMs will mirror mine.

Nevertheless, I’m leery of ranking the friendliness of tools by the order you picked them up. I hereby propose a different yardstick. Take Git, and a traditional SCM. Implement, or at least think about implementing, a clone of each from scratch; just enough so it is self-hosting. Then the one that takes less time to implement is simpler.

I wrote a self-hosting Git clone in a few hours: longer than expected because I spent an inordinate amount of time debugging silly mistakes. Though I haven’t attempted it, I would need more time to write a clone of Perforce or Subversion (pretty much the only other SCMs I have used). With Git, there’s no transactions, revision numbers, rename tracking, central servers, and so on; Git is essentially SHA-1 hashes all the way down.

But let’s humour the author and suppose Git is complex. Then why not use tarballs and patches? This was precisely how Linux was managed for 10 years, so should surely suffice for a budding developer. In fact, I say you should only bother with Git once you realize, firstly, you’re addicted to coding, and secondly, how annoying it is to manage source with tarballs and patches!

In other words, although Git is handy, you only really need it when your project grows beyond a certain point, by which time you’ve already had plenty of fun coding. Same goes for tools like defect trackers.

Apps and Oranges

I agree that developing for mobiles is painful. However, comparing this against those “simple programs of a few hundred lines of C++ long ago” is unfair. With mobile apps, the program usually runs on a system different to the one used to write the code.

It might be fairer to compare writing an mobile app with, say, programming a dot matrix printer of yesteryear, as in both cases the target is different to the system used to write the code. I once did the latter, for the venerable Epson MX-80: after struggling with a ton of hardware-specific low-level nonsense, I was rewarded with a handful of crummy pictures. I would say it involved less “real programming” than writing an Android app.

All the same, I concede that writing Android software is harder than it should be, largely due to Java. But firstly, a mobile phone involves security and privacy issues that would never arise with a dot matrix printer, which necessarily implies more bookkeeping, and secondly, the Java problem can be worked around: either via native code, or a non-Java compiler that generates Dalvik bytecode. [I’ve only mentioned Android throughout because it’s the only mobile platform I’ve developed on.]

Comparing server-side web apps with the good old days is similarly unfair unless the good old days also involved networks, in which case they were really the bad old days. PC gamers of a certain age may remember a myriad of mysterious network options to configure multiplayer mode; imagine the even more mysterious code behind it. As for cloud apps, I would rather work on a cloud app than on an old-school equivalent: BBS software, which involves renting out extra phones lines if you want high availability.

What about client-side web apps? As they can run on the same system used to develop them, it is therefore fair to compare developing them against writing equivalent code in those halcyon days of yore. Let’s look at a couple of examples.


I wrote a tic-tac-toe web app with an AI that plays perfectly because it searches the entire game tree; modern hardware and browsers are so fast that this is bearable (though we’re spared one ply because the human goes first). It works on desktops, laptops, tablets, phones: anything with a browser.

Here’s the minimax game tree search, based on code from John Hughes, Why Functional Programming Matters:

score (Game _ Won 'X') = -1
score (Game _ Won 'O') = 1
score _ = 0

maximize (Node leaf []) = score leaf
maximize (Node _ kids) = maximum (map minimize kids)

minimize (Node leaf []) = score leaf
minimize (Node _ kids) = minimum (map maximize kids)

Despite my scant Haskell knowledge and experience, the source consists of a single file containing less than 150 lines like the above, plus a small HTML file: hardly a “multiplicity of languages”. Writing it was enjoyable, and I did so with a text editor in a window 80 characters wide.

Let’s rewind ten to twenty years. I’d have a hard time achieving the brevity and clarity of the above code. The compiler I used didn’t exist, and depending how far back we go, neither did the language. Not that I’d consider compiling to JavaScript in the first place: depending how far back we go, it was too slow or didn’t exist.


In my student days, I developed a clone of a Windows puzzle game named Netwalk. I chose C, so users either ran untrusted binaries I supplied (one for each architecture), or built their own binaries from scratch. Forget about running it on phones and PDAs.

I managed my files with tarballs and patches. The source consisted of a few thousand lines, though admittedly much of it is GUI cruft: menus, buttons, textboxes, and so on. Lately, I hacked up a web version of Netwalk. The line count? About 150.

Thanks to Git, you can view the entire source right now on Google Code or GitHub, all dolled up with syntax highlighting and line numbers.

Building native binaries has a certain charm, but I have to admit that a client-side web app has less overhead for developers and users alike. I only need to build the JavaScript once, then anyone with a browser can play.

Thus in this case, my new tools are better than my old tools in every way.

Choose Wisely

The real problem perhaps is the sheer number of choices. Tools have multiplied and diversified, and some indeed impede creativity and productivity. But others are a boon for programmers: they truly just let you code.

Which tools are the best ones? The answer probably depends on the person as well as the application, but I will say for basic client-side web apps and native binaries, I heartily recommend my choices: Haskell, Haste, Git.

I’m confident the above would perform admirably for other kinds of projects. I intend to find out, but at the moment I’m having too much fun coding games.

Play Now!

Tuesday, July 29, 2014

15 Shades of Grey

John Carmack indirectly controlled significant chunks of my life. For hours at a time, I would fight in desperate gun battles in beautiful and terrifying worlds he helped create. On top of this, the technical wizardry of id Software’s games inspired me to spend yet more hours learning how they managed to run Wolfenstein 3D and Doom on PCs, in an era when clockspeeds were measured in megahertz and dedicated graphics cards were rare.

I read about cool tricks like binary space partitioning, and eventually wrote a toy 3D engine of my own. The process increased my respect for the programmers: it’s incredibly difficult to get all the finicky details right while sustaining good frame rates.

Accordingly, I paid close attention when John Carmack spoke about programming languages in his QuakeCon 2013 keynote. Many people, myself included, have strong opinions on programming languages, but few have a track record as impressive as his.

Carmack’s Sneaky Plan

I was flabbergasted by Carmack’s thoughts on the Haskell language. He starts by saying: “My big software evolution over the last, certainly three years and stretching back tendrils a little bit further than that, has been this move towards functional programming style and pure functions.”

He then states that not only is Haskell suitable for programming games, but moreover, thinks Haskell may beat typical languages by roughly “a factor of two”, which “would be monumental” and “a really powerful thing for game development”. He has even begun reimplementing Wolfenstein 3D in Haskell as part of a “sneaky plan” to convince others.

Wow! I had always thought Haskell was a pretty but impractical language. I loved composing elegant Haskell snippets to solve problems that one might encounter in interviews and programming contests, but for real stuff I resorted to C.

Among my concerns is garbage collection: I have bad memories of unexpected frequent pauses in Java programs. But Carmack notes that Haskell’s almost uncompromising emphasis on purity simplifies garbage collection to the point where it is a predictable fixed overhead.

A second concern is lazy evaluation. It’s easy to write clear and simple but inefficient Haskell: computing the average of a list of numbers comes to mind. Carmack is also “still not completely sold on the value of laziness”, but evidently it’s not a showstopper for him. I suppose it’s all good so long as there are ways of forcing strict evaluation.

A third concern (but probably not for Carmack) is that I don’t know how to write a Haskell compiler; I’m more at ease with languages when I know how their compilers work. I can ignore this discomfort, though I intend to overcome my ignorance one day. I’m hoping it’s mostly a matter of understanding Hindley-Milner type inference.

Speaking of types, Carmack is a fan of static strong typing, because in his experience, “if it’s syntactically legal, it will make it into the codebase”. He notes during his recent foray into Haskell, the one time he was horribly confused was due to untyped data from the original Wolfenstein 3D.

My Obvious Plan

Once again, I’m inspired by Carmack. I plan to take Haskell more seriously to see if it really is twice as good. Although I lack the resources to develop a complex game, I may be able to slap together a few prototypes from time to time.

First up is the 15-Puzzle by Noyes Palmer Chapman with a cosmetic change: to avoid loading fonts and rendering text, I replaced the numbers 1 to 15 with increasingly darker shades of grey.

I began with a program depending on SDL. The result was surprisingly playable, and I found the source code surprisingly short in spite of my scant knowledge of Haskell. To better show off my work, I made a few edits to produce a version of my program suitable for the Haste compiler, which compiles Haskell to JavaScript. I added mouse support and tweaked the HTML so the game is tolerable on tablets and phones.

Play now!