Writting a brainf*ck parser with BLK @ Código para llevar [blog]

At last! at this time BLK can compile a reasonable C subset... well, ok, it needs pointers and structures in order to have something decent, but the former are picky to simulate in a interpreter, and the structures are in the way. The important thing is that the general structure is more or less stable, even inside the prototyping branch, a consecuence is that is open to any participation :), so let's see an example of how to write a little brainf*ck parser.

The first thing we'll do is to import the bytecode manager and prepare a constant to hold the memory size, in this case it will be a static

from bytecode_manager import *

MEM_SIZE = 1024

The parser will be encapsulated in a class with the following scheme

class BrainfuckParser:

    code = "" # Unparsed code
    _bytecode_manager = None # Bytecode manager
    global_frame = None # The global function

Now comes the class intializer function, it receives the bytecode manager as the only parameter

def __init__(self, bytecode_manager):
    self._bytecode_manager = bytecode_manager

Now we have to prepare a space to hold the code, it gets grouped in frames (which nearly matches the concept of C blocks, they can represent functions, loops or condicionals), at least one is needed in the code to be executable by the interpreter, let's call it '_start'

self.global_frame = gframe = bytecode_manager.add_function("_start",
                                                           void_type, [])

The syntax is:

Bytecode_manager.add_function(<function name>, <function type>, <parameters>)

Now we can define variables and operations inside that, let's manually create a cursor variable to point the current memory position and assign it the value 0

bytecode_manager.add_variable("cursor", int_type, gframe, "Cursor variable")
bytecode_manager.add_op(ASSIGNATION_OP, {"name": "cursor"}, 0, gframe)

Here we have two more functions, the first one, Bytecode_manager.add_variable receives the variable name, it's type, the frame and optionally a comment about the variable as parameters. Bytecode_manager.add_op adds an operation to the frame, the first parameter is the type of operation, the second one, a reference to the variable where the result will be stored, as third the function parámeters (it would be a list if there's more than one), and at last the frame.

It's also posible to create a new type to represent our variables, for example an array to keep track of the memory. Types are represented as dictionaries, al least with the attribute 'byte_size' with the obvious use, there are, too, other useful values like 'structure', which tells the size of the dimensions of the type, for example [2, 3] for a 2x3 array

tmp = {"byte-size": 4, "structure": [MEM_SIZE]}
bytecode_manager.add_variable("mem", tmp, gframe, "Machine memory")

The same can be done with variable references, if they are passed as dictionaries, it's possible to tell in 'pos' which part should be taken into account

def parse_brainfuck(self, frame):
    pointed_ref = {"name":"mem", "pos": ["cursor"]}
    cursor_ref = {"name":"cursor"}

The rest of the parse function doens't need anything new, just ADD_OP and SUB_OP over the two references we just defined, all less the '[' which will be mentioned next

while self.current < self.code_len:
    current_op = self.code[self.current]
    self.current += 1
    # Print
    if current_op == ".":
        self._bytecode_manager.add_op("putchar", {}, [pointed_ref],
                                      frame, ".")

    # Read
    elif current_op == ",":
        self._bytecode_manager.add_op("getchar", pointed_ref, [],
            frame, ",")

    # Inc
    elif current_op == "+":
        self._bytecode_manager.add_op(ADD_OP, pointed_ref,
                                      [pointed_ref, 1], frame, "+")

    # Dec
    elif current_op == "-":
        self._bytecode_manager.add_op(SUB_OP, pointed_ref,
                                      [pointed_ref, 1], frame, "-")

    # Push
    elif current_op == ">":
        self._bytecode_manager.add_op(ADD_OP, cursor_ref,
                                      [cursor_ref, 1], frame, ">")

    # Pop
    elif current_op == "<":
        self._bytecode_manager.add_op(SUB_OP, cursor_ref,
                                      [cursor_ref, e1], frame, "<")

    # While
    elif current_op == "[":
        b = self._bytecode_manager.add_branch(WHILE_BRANCH,
                                              pointed_ref, frame)
        self.parse_brainfuck(b)

    # While-end
    elif current_op == "]":
        assert(frame != self.global_frame)
        return

To add a flow contro operation we'll use Bytecode_manager.add_branch, pasing as first parameter the branching type (WHILE_BRANCH, IF_BRANCH, ELSE_BRANCH or DO_WHILE_BRANCH), the second is the condition, if it's different than 0, the conditional is executed or the loop keeps running, the third is the frame, it also optionally accepts two more, one for the operations which take place before the condition is checked and the last one for operations which takes place befere from the second iteration on, this operation returns a frame for the operations which takes place inside it.

b = self._bytecode_manager.add_branch(WHILE_BRANCH, pointed_ref, frame)

The rest of the code is generic enough to barely need any explanation

def parse(self, f):
    self.code = f.read()
    self.current = 0
    self.code_len = len(self.code)
    self.parse_brainfuck(self.global_frame)

if __name__=="__main__":
    output = "a.blk"
    if len(argv) < 2:
        print "%s " % argv[0]
        exit(0)

    try:
        f = open(argv[1], "rt")
    except:
        print "Error reading %s" % argv[1]
    else:
        bm = BytecodeManager()
        bm.add_entry_point("_start")
        bfp = BrainfuckParser(bm)

    try:
        bfp.parse(f)
    except ParsingException, e:
       print "Parsing exception: %s" % e
       exit(2)

    bm.save(output)

Lo que si que se hace necesario mencionar es Bytecode_manager.add_entry_point que indica la función por la que entrar y Bytecode_manager.save para guardar el código

Which does need get mentioned is Bytecode_manager.add_entry_point to point the entry function and Bytecode_manager.save to save the code

O que si que é necesario mencionar é Bytecode_manager.add_entry_point que indica a función pola que entrar, e Bytecode_manager.save para garda-lo código

The full source code is here:

brainfuck_parser.py

See you

Codigo para llevar [blog]

Writting a brainf*ck parser with BLK