Some time ago I found a found a video of someone implementing the Game of Life on APL (you should take a look at it, the rest of the post will make more sense afterwards). APL is famous for being both expressive and unreadable for the non initiated. For example this Game of Life can be expressed as (code from dyalog.com):

life{                                  ⍝ John Conway's "Game of Life".
    1 .3 4=+/,¯1 0 1∘.¯1 0 1∘.⌽⊂  ⍝ Expression for next generation.
}

The video linked above is quite interesting as it explains what all that symbols mean.

Some time later I was fiddling with the Futhark programming language and found a blog post titled "Futhark as target language for an APL compiler". This made me realize that APL and Futhark operate over (mostly) the same base concepts, and that replicating the same process on Futhark will be a good way to get to learn the basic operations of the language.

As an side note that this Game of Life is already implemented as a benchmark sample program for Futhark, so if you're interested in a useful implementation you should probably consider that one ;)

What is Futhark?

Futhark website describes the language as a "High-performance purely functional data-parallel array programming on the GPU". You could think it's something like a Haskell (with no recursion) which compiles its programs to something that a GPU can run (or anything OpenCL-enabled).

Please note that you might consider this as a light introduction to Futhark, but there probably are a lot of bad practices here, so take everything with a grain of salt.

The rules

For this I'll follow the steps described for the APL implementation on this video. This will be the last reminder to take a look at it, it'll really help follow the rest of the post.

Also, it'd be much better if you could follow along in your own computer. You can find the instructions to install Futhark here and type the commands in the futhark repl.

I'll also format the results so it's easier the read them, the following transformations are done:

  • true -> T
  • false -> _
  • 1i32 -> 1, 2i32 -> 2, ...
  • 0 -> _
  • Commas (,) might be eliminated
  • Arrays more than 2 dimensions deep might be reordered, for example
[[[0i32, 1i32, 2i32], [0i32, 0i32, 6i32]], [[7i32, 8i32, 9i32], [10i32, 11i32, 12i32]]

-- Might be formatted as
[[[_ 1 2]  [[ 7  8  9]
  [_ _ 6]]  [10 11 12]]

The implementation

$ futhark repl
|// |\    |   |\  |\   /
|/  | \   |\  |\  |/  /
|   |  \  |/  |   |\  \
|   |   \ |   |   | \  \
Version 0.13.0.
Copyright (C) DIKU, University of Copenhagen, released under the ISC license.

Run :help for a list of commands.

Let's go, first thing we have to do is to build an example grid to run the Game of Life on.

A vector of the first 9 natural numbers

[0]> iota(9)
[0, 1, 2, 3, 4, 5, 6, 7, 8]

A 3 by 3 reshape gives us a 3 row, 3 column matrix

[1]> unflatten(3)(3)(iota(9))
[[0i32, 1i32, 2i32], [3i32, 4i32, 5i32], [6i32, 7i32, 8i32]]

Which of the items of the matrix are members of this vector. 1 for yes, 0 for no. We'll call this boolean matrix r

[2]> let are_accepted (arr: []i32, accepted: []i32): []bool = map(\item -> any(\ok -> item == ok)(accepted)) arr
[3]> let bool_to_num (v: bool): i32 = match v case false -> 0 case true -> 1
[3]> let r = unflatten(3)(3)(map bool_to_num (are_accepted (iota(9), [1, 2, 3, 4, 7])))
[4]> r
[[0, 1, 1], [1, 1, 0], [0, 1, 0i32]]

At this point, we can already see that the APL result will be much more compact.

Also, it'd be cool to make are_accepted generic, and not only accept i32

We can embed r in a slightly larger matrix of 5 by 7 take of r, which pads below and on the right with zeroes.

Looks like if we want the same "extend array" semantics we'll have to implement it.

[5]> let extend1 [n]'a (x: i32, arr: [n]a, item: a): []a = arr ++ (replicate(x - n)(item))
[6]> let extend2 'a (y: i32, x: i32, arr: [][]a, item: a): [][]a = extend1 (y, map(\row -> extend1 (x, row, item))(arr), replicate(x)(item))
[7]> extend2 (5, 7, r, 0)
-- I'll start formatting results now
[[_ 1 1 _ _ _ _]
 [1 1 _ _ _ _ _]
 [_ 1 _ _ _ _ _]
 [_ _ _ _ _ _ _]
 [_ _ _ _ _ _ _]]

And we can center our original within this by doing the "neg-2, rotate around the vertical axis"

[8]> let rotate2 'a (r: i32) (xs: [][]a) = map (\row -> rotate(r)(row))(xs)
[9]> rotate2(-2)(extend2 (5, 7, r, 0))
[[_ _ _ 1 1 _ _]
 [_ _ 1 1 _ _ _]
 [_ _ _ 1 _ _ _]
 [_ _ _ _ _ _ _]
 [_ _ _ _ _ _ _]]

And the "neg-1 rotate around horizontal axis". And we'll call this larger matrix R.

[9]> let R = rotate(-1)(rotate2(-2)(extend2 (5, 7, r, 0)))
[10]> R
[[_ _ _ _ _ _ _]
 [_ _ _ 1 1 _ _]
 [_ _ 1 1 _ _ _]
 [_ _ _ 1 _ _ _]
 [_ _ _ _ _ _ _]]

Here is a vector of 3 matrices

[11]> [R, R, R]
[[[_ _ _ _ _ _ _] [[_ _ _ _ _ _ _]  [[_ _ _ _ _ _ _]
  [_ _ _ 1 1 _ _]  [_ _ _ 1 1 _ _]   [_ _ _ 1 1 _ _]
  [_ _ 1 1 _ _ _]  [_ _ 1 1 _ _ _]   [_ _ 1 1 _ _ _]
  [_ _ _ 1 _ _ _]  [_ _ _ 1 _ _ _]   [_ _ _ 1 _ _ _]
  [_ _ _ _ _ _ _]]  [_ _ _ _ _ _ _]] [_ _ _ _ _ _ _]]

Now, Futhark doesn't have a print function, so we'll skip the fancy display.

The "1 by 0 by neg-1, rotate each", distributes the rotation between corresponding arguments on the left and right argument vectors. So we see our original 0-rotated in the center with the one and the neg-1 rotation on either side.

[12]> map (\(rot, mat) -> rotate2(rot)(mat)) (zip([1, 0, -1])([R, R, R]))

We can simplify this slightly by having just one copy of R, which is enclosed to form a rank-0 array, which is distributed to each item in the left argument vector, so we get the same result.

Ok... to be honest I've got lost with the "rank-0 array"... but I guess it maps to this.

[13]> map (\rot -> rotate2(rot)(R)) ([1, 0, -1])
[[[_ _ _ _ _ _ _]  [[_ _ _ _ _ _ _]  [[_ _ _ _ _ _ _]
  [_ _ 1 1 _ _ _]   [_ _ _ 1 1 _ _]   [_ _ _ _ 1 1 _]
  [_ 1 1 _ _ _ _]   [_ _ 1 1 _ _ _]   [_ _ _ 1 1 _ _]
  [_ _ 1 _ _ _ _]   [_ _ _ 1 _ _ _]   [_ _ _ _ 1 _ _]
  [_ _ _ _ _ _ _]]  [_ _ _ _ _ _ _]]  [_ _ _ _ _ _ _]]]

Next we'll do the "1 by 0 by neg-1, outer product, column rotation".

And "outer product" takes each item of the left argument and distributes the function between it and each item on the right argument.

So we get a table or a matrix, in this case a 3 by 3 matrix, with our original in the center surrounded by all possible 1-rotations.

[14]> map(\yrot -> (map (\xrot -> rotate(yrot)(rotate2(xrot)(R)))) ([1, 0, -1]))([1, 0, -1])
[[[[_ _ 1 1 _ _ _]  [[_ _ _ 1 1 _ _]  [[_ _ _ _ 1 1 _]
   [_ 1 1 _ _ _ _]   [_ _ 1 1 _ _ _]   [_ _ _ 1 1 _ _]
   [_ _ 1 _ _ _ _]   [_ _ _ 1 _ _ _]   [_ _ _ _ 1 _ _]
   [_ _ _ _ _ _ _]   [_ _ _ _ _ _ _]   [_ _ _ _ _ _ _]
   [_ _ _ _ _ _ _]]  [_ _ _ _ _ _ _]]  [_ _ _ _ _ _ _]]]

 [[[_ _ _ _ _ _ _]  [[_ _ _ _ _ _ _]  [[_ _ _ _ _ _ _]
   [_ _ 1 1 _ _ _]   [_ _ _ 1 1 _ _]   [_ _ _ _ 1 1 _]
   [_ 1 1 _ _ _ _]   [_ _ 1 1 _ _ _]   [_ _ _ 1 1 _ _]
   [_ _ 1 _ _ _ _]   [_ _ _ 1 _ _ _]   [_ _ _ _ 1 _ _]
   [_ _ _ _ _ _ _]]  [_ _ _ _ _ _ _]]  [_ _ _ _ _ _ _]]]

 [[[_ _ _ _ _ _ _]  [[_ _ _ _ _ _ _]  [[_ _ _ _ _ _ _]
   [_ _ _ _ _ _ _]   [_ _ _ _ _ _ _]   [_ _ _ _ _ _ _]
   [_ _ 1 1 _ _ _]   [_ _ _ 1 1 _ _]   [_ _ _ _ 1 1 _]
   [_ 1 1 _ _ _ _]   [_ _ 1 1 _ _ _]   [_ _ _ 1 1 _ _]
   [_ _ 1 _ _ _ _]]  [_ _ _ 1 _ _ _]]  [_ _ _ _ 1 _ _]]]]

And if we do a column sum of the 3 by 3 matrix we get a 3 vector.

This will be tougher (we don't seem to have a column-reduce operation), so let's think it through. Let's call the previous operation expanded and extract its type.

[15]> let expanded = map(\yrot -> (map (\xrot -> rotate(yrot)(rotate2(xrot)(R)))) ([1, 0, -1]))([1, 0, -1])
[16]> :type expanded
expanded : [][][][]i32

So expanded is and 4-D array:

  • The 1st dimension is the Y-rotation
  • The 2nd dimension is the X-rotation
  • The 3rd dimension is the grid row
  • The 4th dimension is the grid column

We want to collapse the 1st dimension, and get a 3-D array:

  • The (new) 1st dimension is the X-rotation
  • The (new) 2nd dimension is the grid row
  • The (new) 3rd dimension is the grid column

And the value of each element is the sum of all elements with that 3-D coordinates (with different Y-rotation dimension).

A way to do this might be to push the Y-rotation dimension down to the 4th dimension, and then do a reduction on the 3rd one. We can use transpose for this.

--- Push the 4th dimension down
[17]> let pushed = map(\xrd -> map(\row -> transpose(row))(xrd)) (map(\xrd -> transpose xrd) (transpose(expanded)))
--- Then apply the count
[18]> map(\xrd -> map(\row -> map(i32.sum) row )  xrd) pushed
[[[_ _ 1 1 _ _ _] [[_ _ _ 1 1 _ _] [[_ _ _ _ 1 1 _]
  [_ 1 2 1 _ _ _]  [_ _ 1 2 1 _ _]  [_ _ _ 1 2 1 _]
  [_ 1 3 1 _ _ _]  [_ _ 1 3 1 _ _]  [_ _ _ 1 3 1 _]
  [_ 1 2 _ _ _ _]  [_ _ 1 2 _ _ _]  [_ _ _ 1 2 _ _]
  [_ _ 1 _ _ _ _]] [_ _ _ 1 _ _ _]] [_ _ _ _ 1 _ _]]]

And if we sum the vector we see a neighbour count for each cell in our original matrix R.

Now, we have to repeat what we did before with the top dimension

[19]> let col_sum = map(\xrd -> map(\row -> map(i32.sum) row )  xrd) pushed
[20]> let pushed_col = map(\row -> transpose(row))(transpose(col_sum))
[21]> map(\row -> map(i32.sum) row)pushed_col
[[_ _ 1 2 2 1 _]
 [_ 1 3 4 3 1 _]
 [_ 1 4 5 4 1 _]
 [_ 1 3 3 2 _ _]
 [_ _ 1 1 1 _ _]]

And rules in Conway's game of life are that: We have a 1 in the following generation

  • If the neighbour count including self is 3 or
  • If the neighbour count int the self is a 4 and the original self is occupied
[22]> let are_accepted2d (arr: [][]i32, accepted: []i32): [][]bool = map (\row -> are_accepted(row, accepted))(arr)
[23]> let tmp = map(\row -> map(i32.sum) row)pushed_col -- same as line 21
[24]> [are_accepted2d (tmp, [3]), are_accepted2d (tmp, [4])]
[[[_ _ _ _ _ _ _]  [[_ _ _ _ _ _ _]
  [_ _ T _ T _ _]   [_ _ _ T _ _ _]
  [_ _ _ _ _ _ _]   [_ _ T _ T _ _]
  [_ _ T T _ _ _]   [_ _ _ _ _ _ _]
  [_ _ _ _ _ _ _]]  [_ _ _ _ _ _ _]]]

So firstly let's find the 3's and 4's.

And we're interested in any 3 and a 4 corresponding to an occupied cell, which is just our original matrix R, so we AND those.

So, if the count is 3, we set that cell to true, but if it's 4 we AND it with the previous state. This might seem not like how the rules of game of life work, but keep in mind that in that count we include the current cell value, so:

  • If it's 3, either the cell was ON and has 2 neighbours or it was OFF and has 3 neighbours. Both are OK.
  • If it's 4, either the cell was ON and has 3 neighbours or it was OFF and has 4 neighbours. The last one is NOT OK, that is why we have to and it with R.
[25]> let to_bool2d(arr: [][]i32): [][]bool = map(\row -> map(\cell -> cell != 0) row) arr
[26]> [ (map (\(row_next, row_prev) -> map (\(next, prev) -> and ([next, prev]))
                                           (zip row_next row_prev))
             (zip (are_accepted2d (tmp, [4])) (to_bool2d R))),
        (are_accepted2d (tmp, [3]))]
[[[_ _ _ _ _ _ _]  [[_ _ _ _ _ _ _]
  [_ _ _ T _ _ _]   [_ _ T _ T _ _]
  [_ _ T _ _ _ _]   [_ _ _ _ _ _ _]
  [_ _ _ _ _ _ _]   [_ _ T T _ _ _]
  [_ _ _ _ _ _ _]]  [_ _ _ _ _ _ _]]]

And both of these matrices contribute to the next generation, so if we OR them together and disclose the result we see a simple matrix for the next generation of the game of life. So here we see a single matrix expression for the next generation.

Again, we don't have column-wise operators, so we have to work with transposes for that:

[27]> (map (\row -> map (or) (transpose row))
           (transpose [ (map (\(row_next, row_prev) -> map (\(next, prev) -> and ([next, prev]))
                                                           (zip row_next row_prev))
                             (zip (are_accepted2d (tmp, [4])) (to_bool2d R))),
                        are_accepted2d (tmp, [3])]))

Better formatted

[28]> let bool_to_num2d(arr: [][]bool): [][]i32 = map (\row -> map (bool_to_num) row) arr
[29]> bool_to_num2d (map (\row -> map (or) (transpose row))
                         (transpose [ (map (\ (row_next, row_prev) -> map (\ (next, prev) -> and ([next, prev]))
                                                                        (zip row_next row_prev))
                                           (zip (are_accepted2d (tmp, [4])) (to_bool2d R))),
                                      are_accepted2d (tmp, [3])]))
[[_ _ _ _ _ _ _]
 [_ _ 1 1 1 _ _]
 [_ _ 1 _ _ _ _]
 [_ _ 1 1 _ _ _]
 [_ _ _ _ _ _ _]

And you'll notice that it contains no explicit loops, ~~and no temporary variables~~, and that the size of the matrix R is not specified anywhere so that the expression is applicable to a matrix of any size.

Let's abstract this into a function ...

I'll be skipping the rest of the video, as it mostly concerns with how APL works with functions, but I'd like to leave you with the final expression on Futhark and some conclusions about the language.

-- some helper functions
let are_accepted (arr: []i32, accepted: []i32): []bool = map(\item -> any(\ok -> item == ok)(accepted)) arr
let rotate2 'a (r: i32) (xs: [][]a) = map (\row -> rotate(r)(row))(xs)
let are_accepted2d (arr: [][]i32, accepted: []i32): [][]bool = map (\row -> are_accepted(row, accepted))(arr)
let to_bool2d(arr: [][]i32): [][]bool = map(\row -> map(\cell -> cell != 0) row) arr
let bool_to_num (v: bool): i32 = match v case false -> 0 case true -> 1
let bool_to_num2d(arr: [][]bool): [][]i32 = map (\row -> map (bool_to_num) row) arr

-- i32, not bool, for better visualization
let life(R: [][]i32): [][]i32 =
   let expanded = map(\yrot -> (map (\xrot -> rotate(yrot)(rotate2(xrot)(R)))) ([1, 0, -1]))([1, 0, -1])
   let pushed = map(\xrd -> map(\row -> transpose(row))(xrd)) (map(\xrd -> transpose xrd) (transpose(expanded)))
   let col_sum = map(\xrd -> map(\row -> map(i32.sum) row )  xrd) pushed
   let pushed_col = map(\row -> transpose(row))(transpose(col_sum))
   let tmp = map(\row -> map(i32.sum) row) pushed_col
   in bool_to_num2d (map (\row -> map (or) (transpose row))
                         (transpose [ (map (\ (row_next, row_prev) -> map (\ (next, prev) -> and ([next, prev]))
                                                                        (zip row_next row_prev))
                                           (zip (are_accepted2d (tmp, [4])) (to_bool2d R))),
                                      are_accepted2d (tmp, [3])]))

-- With only one intermediate variable (we can remove it if we accept to count the neighbours twice)
let life(R: [][]i32): [][]i32 =
   let neighbours = map (\row -> map(i32.sum) row)
                        (map(\row -> transpose(row))
                            (transpose(map(\xrd -> map(\row -> map(i32.sum) row )  xrd)
                                          (map(\xrd -> map(\row -> transpose(row))(xrd))
                                              (map (\xrd -> transpose xrd)
                                                   (transpose(map(\yrot -> (map (\xrot -> rotate(yrot) (rotate2(xrot)(R))))
                                                                           ([1, 0, -1]))
                                                                 ([1, 0, -1]))))))))
   in bool_to_num2d (map (\row -> map (or) (transpose row))
                         (transpose [ (map (\ (row_next, row_prev) -> map (\ (next, prev) -> and ([next, prev]))
                                                                        (zip row_next row_prev))
                                           (zip (are_accepted2d (neighbours, [4])) (to_bool2d R))),
                                      are_accepted2d (neighbours, [3])]))

And some test

let extend1 [n]'a (x: i32, arr: [n]a, item: a): []a = arr ++ (replicate(x - n)(item))
let extend2 'a (y: i32, x: i32, arr: [][]a, item: a): [][]a = extend1 (y, map(\row -> extend1 (x, row, item))(arr), replicate(x)(item))

let r = unflatten(3)(3)(map bool_to_num (are_accepted (iota(9), [1, 2, 3, 4, 7])))
let R = rotate(-1)(rotate2(-2)(extend2 (5, 7, r, 0)))

Conclusion

APL makes it look effortless to move between different dimensions of the arrays. This can be done on Futhark using maps and transposes, but becomes chaotic very fast, ~~expect a post shortly on this~~ more on Matrix operators as stack languages ;).

That being said, Futhark could handle the challenge and it was possible to replicate APL operations.

I have done only really small programs in the language before (a couple maps with a single function inside) and (given that you are used to write programs with map, reduce, and the like) it's relatively easy to learn the language.

To give you an impression on this point, I'd say that the point I spent most time writing this post was figuring out why some function with a parameter named val would not work... it's just a reserved keyword ¯\_(ツ)_/¯ . But keep an eye on this page for bigger programs, just in case [Futhark's blog] Why Futhark (sometimes) goes wrong.

Anyway, I hope that this sparked your interest on Futhark.

References