Writing the game of life in Futhark
Some time ago I found a found a video of someone implementing the Game of Life on APL (you should take a look at it, the rest of the post will make more sense afterwards). APL is famous for being both expressive and unreadable for the non initiated. For example this Game of Life can be expressed as (code from dyalog.com):
life←{ ⍝ John Conway's "Game of Life". ↑1 ⍵∨.∧3 4=+/,¯1 0 1∘.⊖¯1 0 1∘.⌽⊂⍵ ⍝ Expression for next generation. }
The video linked above is quite interesting as it explains what all that symbols mean.
Some time later I was fiddling with the Futhark programming language and found a blog post titled "Futhark as target language for an APL compiler". This made me realize that APL and Futhark operate over (mostly) the same base concepts, and that replicating the same process on Futhark will be a good way to get to learn the basic operations of the language.
As an side note that this Game of Life is already implemented as a benchmark sample program for Futhark, so if you're interested in a useful implementation you should probably consider that one ;)
What is Futhark?
Futhark website describes the language as a "Highperformance purely functional dataparallel array programming on the GPU". You could think it's something like a Haskell (with no recursion) which compiles its programs to something that a GPU can run (or anything OpenCLenabled).
Please note that you might consider this as a light introduction to Futhark, but there probably are a lot of bad practices here, so take everything with a grain of salt.
The rules
For this I'll follow the steps described for the APL implementation on this video. This will be the last reminder to take a look at it, it'll really help follow the rest of the post.
Also, it'd be much better if you could follow along in your own computer. You can find the instructions to install Futhark here and type the commands in the futhark repl
.
I'll also format the results so it's easier the read them, the following transformations are done:

true
>T

false
>_

1i32
>1
,2i32
>2
, ... 
0
>_
 Commas (
,
) might be eliminated  Arrays more than 2 dimensions deep might be reordered, for example
[[[0i32, 1i32, 2i32], [0i32, 0i32, 6i32]], [[7i32, 8i32, 9i32], [10i32, 11i32, 12i32]]  Might be formatted as [[[_ 1 2] [[ 7 8 9] [_ _ 6]] [10 11 12]]
The implementation
$ futhark repl // \  \ \ / /  \ \ \ / /   \ /  \ \   \    \ \ Version 0.13.0. Copyright (C) DIKU, University of Copenhagen, released under the ISC license. Run :help for a list of commands.
Let's go, first thing we have to do is to build an example grid to run the Game of Life on.
A vector of the first 9 natural numbers
[0]> iota(9) [0, 1, 2, 3, 4, 5, 6, 7, 8]
A 3 by 3 reshape gives us a 3 row, 3 column matrix
[1]> unflatten(3)(3)(iota(9)) [[0i32, 1i32, 2i32], [3i32, 4i32, 5i32], [6i32, 7i32, 8i32]]
Which of the items of the matrix are members of this vector. 1 for yes, 0 for no. We'll call this boolean matrix
r
[2]> let are_accepted (arr: []i32, accepted: []i32): []bool = map(\item > any(\ok > item == ok)(accepted)) arr [3]> let bool_to_num (v: bool): i32 = match v case false > 0 case true > 1 [3]> let r = unflatten(3)(3)(map bool_to_num (are_accepted (iota(9), [1, 2, 3, 4, 7]))) [4]> r [[0, 1, 1], [1, 1, 0], [0, 1, 0i32]]
At this point, we can already see that the APL result will be much more compact.
Also, it'd be cool to make are_accepted
generic, and not only accept i32
We can embed
r
in a slightly larger matrix of 5 by 7 take ofr
, which pads below and on the right with zeroes.
Looks like if we want the same "extend array" semantics we'll have to implement it.
[5]> let extend1 [n]'a (x: i32, arr: [n]a, item: a): []a = arr ++ (replicate(x  n)(item)) [6]> let extend2 'a (y: i32, x: i32, arr: [][]a, item: a): [][]a = extend1 (y, map(\row > extend1 (x, row, item))(arr), replicate(x)(item)) [7]> extend2 (5, 7, r, 0)  I'll start formatting results now [[_ 1 1 _ _ _ _] [1 1 _ _ _ _ _] [_ 1 _ _ _ _ _] [_ _ _ _ _ _ _] [_ _ _ _ _ _ _]]
And we can center our original within this by doing the "neg2, rotate around the vertical axis"
[8]> let rotate2 'a (r: i32) (xs: [][]a) = map (\row > rotate(r)(row))(xs) [9]> rotate2(2)(extend2 (5, 7, r, 0)) [[_ _ _ 1 1 _ _] [_ _ 1 1 _ _ _] [_ _ _ 1 _ _ _] [_ _ _ _ _ _ _] [_ _ _ _ _ _ _]]
And the "neg1 rotate around horizontal axis". And we'll call this larger matrix
R
.
[9]> let R = rotate(1)(rotate2(2)(extend2 (5, 7, r, 0))) [10]> R [[_ _ _ _ _ _ _] [_ _ _ 1 1 _ _] [_ _ 1 1 _ _ _] [_ _ _ 1 _ _ _] [_ _ _ _ _ _ _]]
Here is a vector of 3 matrices
[11]> [R, R, R] [[[_ _ _ _ _ _ _] [[_ _ _ _ _ _ _] [[_ _ _ _ _ _ _] [_ _ _ 1 1 _ _] [_ _ _ 1 1 _ _] [_ _ _ 1 1 _ _] [_ _ 1 1 _ _ _] [_ _ 1 1 _ _ _] [_ _ 1 1 _ _ _] [_ _ _ 1 _ _ _] [_ _ _ 1 _ _ _] [_ _ _ 1 _ _ _] [_ _ _ _ _ _ _]] [_ _ _ _ _ _ _]] [_ _ _ _ _ _ _]]
Now, Futhark doesn't have a print function, so we'll skip the fancy display.
The "1 by 0 by neg1, rotate each", distributes the rotation between corresponding arguments on the left and right argument vectors. So we see our original 0rotated in the center with the one and the neg1 rotation on either side.
[12]> map (\(rot, mat) > rotate2(rot)(mat)) (zip([1, 0, 1])([R, R, R]))
We can simplify this slightly by having just one copy of R, which is enclosed to form a rank0 array, which is distributed to each item in the left argument vector, so we get the same result.
Ok... to be honest I've got lost with the "rank0 array"... but I guess it maps to this.
[13]> map (\rot > rotate2(rot)(R)) ([1, 0, 1]) [[[_ _ _ _ _ _ _] [[_ _ _ _ _ _ _] [[_ _ _ _ _ _ _] [_ _ 1 1 _ _ _] [_ _ _ 1 1 _ _] [_ _ _ _ 1 1 _] [_ 1 1 _ _ _ _] [_ _ 1 1 _ _ _] [_ _ _ 1 1 _ _] [_ _ 1 _ _ _ _] [_ _ _ 1 _ _ _] [_ _ _ _ 1 _ _] [_ _ _ _ _ _ _]] [_ _ _ _ _ _ _]] [_ _ _ _ _ _ _]]]
Next we'll do the "1 by 0 by neg1, outer product, column rotation".
And "outer product" takes each item of the left argument and distributes the function between it and each item on the right argument.
So we get a table or a matrix, in this case a 3 by 3 matrix, with our original in the center surrounded by all possible 1rotations.
[14]> map(\yrot > (map (\xrot > rotate(yrot)(rotate2(xrot)(R)))) ([1, 0, 1]))([1, 0, 1]) [[[[_ _ 1 1 _ _ _] [[_ _ _ 1 1 _ _] [[_ _ _ _ 1 1 _] [_ 1 1 _ _ _ _] [_ _ 1 1 _ _ _] [_ _ _ 1 1 _ _] [_ _ 1 _ _ _ _] [_ _ _ 1 _ _ _] [_ _ _ _ 1 _ _] [_ _ _ _ _ _ _] [_ _ _ _ _ _ _] [_ _ _ _ _ _ _] [_ _ _ _ _ _ _]] [_ _ _ _ _ _ _]] [_ _ _ _ _ _ _]]] [[[_ _ _ _ _ _ _] [[_ _ _ _ _ _ _] [[_ _ _ _ _ _ _] [_ _ 1 1 _ _ _] [_ _ _ 1 1 _ _] [_ _ _ _ 1 1 _] [_ 1 1 _ _ _ _] [_ _ 1 1 _ _ _] [_ _ _ 1 1 _ _] [_ _ 1 _ _ _ _] [_ _ _ 1 _ _ _] [_ _ _ _ 1 _ _] [_ _ _ _ _ _ _]] [_ _ _ _ _ _ _]] [_ _ _ _ _ _ _]]] [[[_ _ _ _ _ _ _] [[_ _ _ _ _ _ _] [[_ _ _ _ _ _ _] [_ _ _ _ _ _ _] [_ _ _ _ _ _ _] [_ _ _ _ _ _ _] [_ _ 1 1 _ _ _] [_ _ _ 1 1 _ _] [_ _ _ _ 1 1 _] [_ 1 1 _ _ _ _] [_ _ 1 1 _ _ _] [_ _ _ 1 1 _ _] [_ _ 1 _ _ _ _]] [_ _ _ 1 _ _ _]] [_ _ _ _ 1 _ _]]]]
And if we do a column sum of the 3 by 3 matrix we get a 3 vector.
This will be tougher (we don't seem to have a columnreduce operation), so let's think it through.
Let's call the previous operation expanded
and extract its type.
[15]> let expanded = map(\yrot > (map (\xrot > rotate(yrot)(rotate2(xrot)(R)))) ([1, 0, 1]))([1, 0, 1]) [16]> :type expanded expanded : [][][][]i32
So expanded
is and 4D array:
 The 1st dimension is the Yrotation
 The 2nd dimension is the Xrotation
 The 3rd dimension is the grid row
 The 4th dimension is the grid column
We want to collapse the 1st dimension, and get a 3D array:
 The (new) 1st dimension is the Xrotation
 The (new) 2nd dimension is the grid row
 The (new) 3rd dimension is the grid column
And the value of each element is the sum of all elements with that 3D coordinates (with different Yrotation dimension).
A way to do this might be to push the Yrotation dimension down to the 4th dimension, and then do a reduction on the 3rd one. We can use transpose
for this.
 Push the 4th dimension down [17]> let pushed = map(\xrd > map(\row > transpose(row))(xrd)) (map(\xrd > transpose xrd) (transpose(expanded)))  Then apply the count [18]> map(\xrd > map(\row > map(i32.sum) row ) xrd) pushed [[[_ _ 1 1 _ _ _] [[_ _ _ 1 1 _ _] [[_ _ _ _ 1 1 _] [_ 1 2 1 _ _ _] [_ _ 1 2 1 _ _] [_ _ _ 1 2 1 _] [_ 1 3 1 _ _ _] [_ _ 1 3 1 _ _] [_ _ _ 1 3 1 _] [_ 1 2 _ _ _ _] [_ _ 1 2 _ _ _] [_ _ _ 1 2 _ _] [_ _ 1 _ _ _ _]] [_ _ _ 1 _ _ _]] [_ _ _ _ 1 _ _]]]
And if we sum the vector we see a neighbour count for each cell in our original matrix R.
Now, we have to repeat what we did before with the top dimension
[19]> let col_sum = map(\xrd > map(\row > map(i32.sum) row ) xrd) pushed [20]> let pushed_col = map(\row > transpose(row))(transpose(col_sum)) [21]> map(\row > map(i32.sum) row)pushed_col [[_ _ 1 2 2 1 _] [_ 1 3 4 3 1 _] [_ 1 4 5 4 1 _] [_ 1 3 3 2 _ _] [_ _ 1 1 1 _ _]]
And rules in Conway's game of life are that: We have a 1 in the following generation
 If the neighbour count including self is 3 or
 If the neighbour count int the self is a 4 and the original self is occupied
[22]> let are_accepted2d (arr: [][]i32, accepted: []i32): [][]bool = map (\row > are_accepted(row, accepted))(arr) [23]> let tmp = map(\row > map(i32.sum) row)pushed_col  same as line 21 [24]> [are_accepted2d (tmp, [3]), are_accepted2d (tmp, [4])] [[[_ _ _ _ _ _ _] [[_ _ _ _ _ _ _] [_ _ T _ T _ _] [_ _ _ T _ _ _] [_ _ _ _ _ _ _] [_ _ T _ T _ _] [_ _ T T _ _ _] [_ _ _ _ _ _ _] [_ _ _ _ _ _ _]] [_ _ _ _ _ _ _]]]
So firstly let's find the 3's and 4's.
And we're interested in any 3 and a 4 corresponding to an occupied cell, which is just our original matrix R, so we AND those.
So, if the count is 3, we set that cell to true, but if it's 4 we AND it with the previous state. This might seem not like how the rules of game of life work, but keep in mind that in that count we include the current cell value, so:
 If it's 3, either the cell was ON and has 2 neighbours or it was OFF and has 3 neighbours. Both are OK.
 If it's 4, either the cell was ON and has 3 neighbours or it was OFF and has 4 neighbours. The last one is NOT OK, that is why we have to and it with
R
.
[25]> let to_bool2d(arr: [][]i32): [][]bool = map(\row > map(\cell > cell != 0) row) arr [26]> [ (map (\(row_next, row_prev) > map (\(next, prev) > and ([next, prev])) (zip row_next row_prev)) (zip (are_accepted2d (tmp, [4])) (to_bool2d R))), (are_accepted2d (tmp, [3]))] [[[_ _ _ _ _ _ _] [[_ _ _ _ _ _ _] [_ _ _ T _ _ _] [_ _ T _ T _ _] [_ _ T _ _ _ _] [_ _ _ _ _ _ _] [_ _ _ _ _ _ _] [_ _ T T _ _ _] [_ _ _ _ _ _ _]] [_ _ _ _ _ _ _]]]
And both of these matrices contribute to the next generation, so if we OR them together and disclose the result we see a simple matrix for the next generation of the game of life. So here we see a single matrix expression for the next generation.
Again, we don't have columnwise operators, so we have to work with transpose
s for that:
[27]> (map (\row > map (or) (transpose row)) (transpose [ (map (\(row_next, row_prev) > map (\(next, prev) > and ([next, prev])) (zip row_next row_prev)) (zip (are_accepted2d (tmp, [4])) (to_bool2d R))), are_accepted2d (tmp, [3])]))
Better formatted
[28]> let bool_to_num2d(arr: [][]bool): [][]i32 = map (\row > map (bool_to_num) row) arr [29]> bool_to_num2d (map (\row > map (or) (transpose row)) (transpose [ (map (\ (row_next, row_prev) > map (\ (next, prev) > and ([next, prev])) (zip row_next row_prev)) (zip (are_accepted2d (tmp, [4])) (to_bool2d R))), are_accepted2d (tmp, [3])])) [[_ _ _ _ _ _ _] [_ _ 1 1 1 _ _] [_ _ 1 _ _ _ _] [_ _ 1 1 _ _ _] [_ _ _ _ _ _ _]
And you'll notice that it contains no explicit loops,
and no temporary variables, and that the size of the matrix R is not specified anywhere so that the expression is applicable to a matrix of any size.Let's abstract this into a function ...
I'll be skipping the rest of the video, as it mostly concerns with how APL works with functions, but I'd like to leave you with the final expression on Futhark and some conclusions about the language.
 some helper functions let are_accepted (arr: []i32, accepted: []i32): []bool = map(\item > any(\ok > item == ok)(accepted)) arr let rotate2 'a (r: i32) (xs: [][]a) = map (\row > rotate(r)(row))(xs) let are_accepted2d (arr: [][]i32, accepted: []i32): [][]bool = map (\row > are_accepted(row, accepted))(arr) let to_bool2d(arr: [][]i32): [][]bool = map(\row > map(\cell > cell != 0) row) arr let bool_to_num (v: bool): i32 = match v case false > 0 case true > 1 let bool_to_num2d(arr: [][]bool): [][]i32 = map (\row > map (bool_to_num) row) arr  i32, not bool, for better visualization let life(R: [][]i32): [][]i32 = let expanded = map(\yrot > (map (\xrot > rotate(yrot)(rotate2(xrot)(R)))) ([1, 0, 1]))([1, 0, 1]) let pushed = map(\xrd > map(\row > transpose(row))(xrd)) (map(\xrd > transpose xrd) (transpose(expanded))) let col_sum = map(\xrd > map(\row > map(i32.sum) row ) xrd) pushed let pushed_col = map(\row > transpose(row))(transpose(col_sum)) let tmp = map(\row > map(i32.sum) row) pushed_col in bool_to_num2d (map (\row > map (or) (transpose row)) (transpose [ (map (\ (row_next, row_prev) > map (\ (next, prev) > and ([next, prev])) (zip row_next row_prev)) (zip (are_accepted2d (tmp, [4])) (to_bool2d R))), are_accepted2d (tmp, [3])]))  With only one intermediate variable (we can remove it if we accept to count the neighbours twice) let life(R: [][]i32): [][]i32 = let neighbours = map (\row > map(i32.sum) row) (map(\row > transpose(row)) (transpose(map(\xrd > map(\row > map(i32.sum) row ) xrd) (map(\xrd > map(\row > transpose(row))(xrd)) (map (\xrd > transpose xrd) (transpose(map(\yrot > (map (\xrot > rotate(yrot) (rotate2(xrot)(R)))) ([1, 0, 1])) ([1, 0, 1])))))))) in bool_to_num2d (map (\row > map (or) (transpose row)) (transpose [ (map (\ (row_next, row_prev) > map (\ (next, prev) > and ([next, prev])) (zip row_next row_prev)) (zip (are_accepted2d (neighbours, [4])) (to_bool2d R))), are_accepted2d (neighbours, [3])]))
And some test
let extend1 [n]'a (x: i32, arr: [n]a, item: a): []a = arr ++ (replicate(x  n)(item)) let extend2 'a (y: i32, x: i32, arr: [][]a, item: a): [][]a = extend1 (y, map(\row > extend1 (x, row, item))(arr), replicate(x)(item)) let r = unflatten(3)(3)(map bool_to_num (are_accepted (iota(9), [1, 2, 3, 4, 7]))) let R = rotate(1)(rotate2(2)(extend2 (5, 7, r, 0)))
Conclusion
APL makes it look effortless to move between different dimensions of the arrays.
This can be done on Futhark using map
s and transpose
s, but becomes chaotic very fast, expect a post shortly on this more on Matrix operators as stack languages ;).
That being said, Futhark could handle the challenge and it was possible to replicate APL operations.
I have done only really small programs in the language before (a couple map
s with a single function inside)
and (given that you are used to write programs with map
, reduce
, and the like) it's relatively easy to learn the language.
To give you an impression on this point, I'd say that the point I spent most time writing this post was figuring out why some
function with a parameter named val
would not work... it's just a reserved keyword ¯\_(ツ)_/¯ .
But keep an eye on this page for bigger programs, just in case [Futhark's blog] Why Futhark (sometimes) goes wrong.
Anyway, I hope that this sparked your interest on Futhark.