Long ago, when I first looked into machine learning, neural networks didn’t stand out of the crowd. They seemed on par with decision trees, genetic algorithms, genetic programming, and a host of other techniques. I wound up dabbling in genetic programming because it seemed coolest.

Neural networks have since distinguished themselves. Lately, they seem responsible for each newsworthy machine learning achievement I hear about. To name a few:

Inspired, I began reading Michael Nielsen’s online book on neural networks. We can whip up a neural network without straying beyond a Haskell base install, though we do have to implement the Box-Muller transform ourselves to avoid pulling in a library to sample from a normal distribution.

The following generates a neural network with 3 inputs, a hidden layer of 4 neurons, and 2 output neurons, and feeds it the inputs [0.1, 0.2, 0.3].

import Control.Monad import Data.Functor import Data.List import System.Random main = newBrain [3, 4, 2] >>= print . feed [0.1, 0.2, 0.3] newBrain szs@(_:ts) = zip (flip replicate 1 <$> ts) <$> zipWithM (\m n -> replicateM n $ replicateM m $ gauss 0.01) szs ts feed = foldl' (((max 0 <$>) . ) . zLayer) zLayer as (bs, wvs) = zipWith (+) bs $ sum . zipWith (*) as <$> wvs gauss :: Float -> IO Float gauss stdev = do x <- randomIO y <- randomIO return $ stdev * sqrt (-2 * log x) * cos (2 * pi * y)

The tough part is training the network. The sane choice is to use a library to help with the matrix and vector operations involved in backpropagation by gradient descent, but where’s the fun in that?

It turns out even if we stay within core Haskell, we only need a few more lines, albeit some hairy ones:

relu = max 0 relu' x | x < 0 = 0 | otherwise = 1 revaz xs = foldl' (\(avs@(av:_), zs) (bs, wms) -> let zs' = zLayer av (bs, wms) in ((relu <$> zs'):avs, zs':zs)) ([xs], []) dCost a y | y == 1 && a >= y = 0 | otherwise = a - y deltas xv yv layers = let (avs@(av:_), zv:zvs) = revaz xv layers delta0 = zipWith (*) (zipWith dCost av yv) (relu' <$> zv) in (reverse avs, f (transpose . snd <$> reverse layers) zvs [delta0]) where f _ [] dvs = dvs f (wm:wms) (zv:zvs) dvs@(dv:_) = f wms zvs $ (:dvs) $ zipWith (*) [sum $ zipWith (*) row dv | row <- wm] (relu' <$> zv) descend av dv = zipWith (-) av ((0.002 *) <$> dv) learn xv yv layers = let (avs, dvs) = deltas xv yv layers in zip (zipWith descend (fst <$> layers) dvs) $ zipWith3 (\wvs av dv -> zipWith (\wv d -> descend wv ((d*) <$> av)) wvs dv) (snd <$> layers) avs dvs

See my Haskell notes for details. In short: ReLU activation function; online learning with a rate of 0.002; an ad hoc cost function that felt right at the time.

Despite cutting many corners, after a few runs, I obtained a neural network that correctly classifies 9202 of 10000 handwritten digits in the MNIST test set in just one pass over the training set.

I found this result surprisingly good. Yet there is much more to explore: top on my must-see list are deep learning (also described in Nielsen’s book) and long short-term memory.

I turned the neural net into an
**online digit recognition demo**: you can
draw on the canvas and see how it affects the outputs.