Detecting non halting programs
Introduction
There's this idea on math and computer-science that some problems cannot be resolved and are, fundamentally, incomputable. That makes sense! A specific example on comp-sci is the "halting problem".
The halting problem states that there is not possible to know, from a program description and it's imput, whether that program will eventually stop or whether it will run forever.
This is a widely accepted fact and known to be true. It has wide repercusions and many, really intelligent people has probably worked on it.
That being said, it just doesn't feel well. Certainly, you should be able to know if a program eventually completes or if it just runs infinitely! So from time to time I thought about this.
The idea
NOTE: again, this is an uninformed post on a random site of the internet. Consider this, at most, as a funny party trick (for some definition of funny 😉).
I reached the conclusion (if I recall correctly, with inspuration from comments in some forum) that, supposing the program doesn't need infinite memory and doesn't have infinite inputs, it should be possible to know if a program is on an infinite loop by checking on each step of it's execution whether it has been on that state of execution before.
This is easy to visualize, imagine the following progression of "states" on a program. Two states should are equal if all the memory of the program and it's internal state (including current execution address) is the same.
- State 1
- State 2
- State 3
- [...]
- State 2 (infinite loop detected!)
For some time I thought about how can this be done for a practically sized program. Storing the value of all past states would generate a lot of data in a short amount of time. Thinking about which tricks can compress it down to a manageable amount, one day I realized that the solution was hidden on a very well-known algorithm.
The trick
Floyd's tortoise and hare algorithm is a well-known solution to a well-known job interview problem, "how to detect if there's a loop on a linked list". The algorithm works as follows:
- Initialize a pair of pointers to the start of a list, one is the
hare
and one is theturtle
. - Each iteration, advance the
hare
pointer twice, and theturtle
pointer once. - If the
hare
eventually reaches the end of the list, it has no cycles. - If the
hare
points at the same item as theturtle
there must be a loop.
As I said, this is a well-known algorithm, and you might start to see what it means to our "problem". Our program has "states" instead of the "items" of a linked-list, but other from that the mapping is direct.
To detect if our program contains an infinite loop we need the memory to run two instances of the program being analyzed and one simple runtime:
- Initialize two instances of our program
hare
, andturtle
. - At each iteration, advance the execution of the
hare
twice, and theturtle
once. - If the
hare
completes it's execution, we can declare that the program eventually finishes. - If the
hare
at some point reaches the same state as theturtle
, it must contain an infinite loop. And thus, it won't halt.
To test this idea I've coded up a simple Brainf*ck interpreter that will detect if an input program halts or not. You can find it here: hare-machine.
I won't go through the code as it's only a simple Brainf*uck interpreter with some logic to compare states of two program instances, which amounts to this (pc
is the instruction pointer and mp
is the memory pointer):
int is_same_state(state s1, state s2) {
if ((s1.pc != s2.pc) || (s1.mp != s2.mp)) {
return 0;
}
return memcmp(s1.mem, s2.mem, MAX_PROGRAM_MEMORY) == 0;
}
If you want to see it running just copy the whole folder and run make
to run it over some examples:
~/repos/codigo-para-llevar/c/hare-machine $ make
cc check-halt.c -o check-halt
bash test.sh
Should HALT
./check-halt "examples/halt/cell-width.bf"=HALTS
./check-halt "examples/halt/hello-world.bf"=HALTS
./check-halt "examples/halt/move-value.bf"=HALTS
Should LOOP (not halt)
./check-halt "examples/loop/simple-loop.bf"=LOOPS
~/repos/codigo-para-llevar/c/hare-machine $
Now, with inputs
Now, I promised that this would work as long as the inputs are not infinite, but the implementation won't take any inputs...
One option is would be to codify the inputs on the state of the memory at the start of the program, if known.
Other option, if not all input is known at first, would be to add a new variable "input-count" to the program state (and to the state-equality check) to keep track of the input data, and have a buffer for the input consumed by the hare
and not the turtle
.
For infinite inputs... it's trivial to cook a program with infinite input that we can't know if it will halt:
int main() {
while (getchar() != 'q') {
// Whatever
}
return 0;
}
If that program input can be infinite, we won't ever know if a q
will ever appear, and thus if it will finish.
Closing thoughts
I know, I know! Not implementing the input reading is kind of cheap 🤷, but I've been wanting to show this trick since... May 2020!
For some time I intended to address the usual counterproof, but that would mean writing the interpreter itself Brainf*ck too (though there are already implementations of that), and after having seen the trick once it gets boring. So in the end I have not written this until now. So I'm releasing it as it is now, just so it doesn't rot on my PC for another two years 😅.
Moreover, probably I should read some more on this before talking about theory I don't understand.
Anyways... maybe I sometime adapt the code to take inputs, but I don't have any plans for that. So, if you get bored and write the code for that, feel free to ping me and I'll link to your code from here 😉.
Hope that was interesting! Bye!