G Compiler in G — Part 2

How Compiler’s Work (for G Developers)

Transforming Text-based Source Code to Executable Code

Jim Kring
4 min readFeb 27, 2023

--

In my last article we discussed the process of building a programming language compiler that can compile itself. Part of that process involves building a compiler or execution engine in some other language, first.

So, in order to understand how to build a compiler for a graphical programming language (G) that’s capable of compiling itself might possibly work (if it’s even possible), it will help us to understand a little bit about the workings of compilers for traditional text-based programming languages.

All the various elemental parts of a text-based programming language can be represented in software and involve the steps shown in the figure below:

Visualization by https://twitter.com/@_lrlna

Step 1. Lexical Analysis: Parsing the Source Code Text

The first step in this process is to parse the text of the code, and identify things (a.k.a. “tokens”) like variable names, numbers, function calls, function arguments, loops and other structures. This process is called “parsing” and “lexing”.

Step 2. Syntax Analysis: Generating a Data Structure Representation of the Program

A programming language’s syntax are the the rules about how the individual parts can be put together and the meaning they have based on how they are put together. The syntax analysis step will interpret all the tokens according to the language syntax and build up a data structure (called an Abstract Syntax Tree or “AST”) that represents the source code in a format that’s more “object-oriented” and able to be manipulated by other software.

Step 3. Profits! (er… Code Generation)

With the AST in place for your text-based source code, you can now generate the bits that pay the bills!

Note to self: Add a funny “compiler back-end” joke here.

Example Abstract Syntax Tree (AST) (in Python Pseudo-code)

Let’s look at an AST of a simple text-based language that uses “expressions” as the fundamental component (base class) for the language’s structure.

Here is a simple python class structure for an AST (which might have some bugs and, if so, please feel free to let me know):

@dataclass
class AST:
"""Our AST is a list/array of expressions to evaluate in order."""
lines: List[Expression]= []
@dataclass
class Expression:
"""An Expression is evaluated and returns a value"""
def evaluate(self) -> Any:
...
@dataclass
class LiteralExpression:
"""An Literal Expression has and returns a value"""
value: int | float | str | bool
__init__(self, value):
self.value = value
def evaluate(self):
return value
@dataclass
class UnaryExpression(Expression):
"""An Unary Expression has a single expression"""
expression: Expression
@dataclass
class Negate(UnaryExpression):
"""Evaluate the expression, negates it, and return the result"""
def evaluate(self):
return -self.expression.evaluate()
@dataclass
class BinaryExpression(Expression):
"""An Binary Expression has two expressions: left and right."""
left_expression: Expression
right_expression: Expression
@dataclass
class Add(BinaryExpression):
"""e.g. Add(1, 2) -> 3"""
def evaluate(self):
return left_expression.evaluate() + right_expression.evaluate()

Now, let’s look at how we could represent a simple math program as expressions using the AST model in a super simple, made-up programming language that processes math expressions, line by line, evaluating them and printing the result.

Here’s some “source code”

01: 1 + 3
02: -( 7 + 9 )

And here’s the expected output when we execute it with our:

# -> 4
# -> -16

The way we get from source code to the executed output is to first parse the source into an AST.

Let’s parse this manually (by hand) since these math expressions are fairly straightforward to represent as AST in our python model using pseudocode. The result is a data structure that might look something like this:

# Here is an AST instance (object)representing our two-line "Program" above.
my_ast = AST([

# line 01: 1 + 3
Add(
LiteralExpression(1), # evaluate() -> 1
LiteralExpression(3), # evaluate() -> 3
# -----------------
), # evaluate() -> 4
# line 02: -( 7 + 9 )
Negate(
Add(
LiteralExpression(7), # evaluate() -> 7
LiteralExpression(9), # evaluate() -> 9
# -----------------
) # evaluate() -> 16
# -----------------
) # evaluate() -> -16
])

To evaluate this “program”, let’s define a function that executes the AST (a list of expressions), one line (expression) at a time:

def run(program: AST):
"""Print the result of evaluating each expression in the AST"""
for expression in ast.lines:
print(expression.evaluate())

Hurray, it works!

>>> run(my_ast)
4
-16

Congratulations! You now know how a (simple) compiler runs a program :)

Get it? Because he’s awkward guy and compiler engineers… oh, never mind. We love you, Compiler Engineers!

Now that we’ve covered a little bit about how compilers work and how abstract syntax tree structures help us to execute our code, we can look at how one might create a compiler for a graphical dataflow programming language.

We’ll talk more about that soon…

--

--

The views here are my own, shared in hope they will make a positive difference to all. Please take what works, leave the rest, and share successes with others.