Numbers, Unary Operations, Variables
Lets Write a Compiler!
Our goal is to write a compiler which is a function:
In 131 TargetProgram is going to be a binary executable.
Lets write our first Compilers
SourceProgram will be a sequence of four tiny “languages”
- Numbers
- e.g.
7,12,42…
- Numbers + Increment
- e.g.
add1(7),add1(add1(12)), …
- Numbers + Increment + Decrement
- e.g.
add1(7),add1(add1(12)),sub1(add1(42))
- Numbers + Increment + Decrement + Local Variables
- e.g.
let x = add1(7), y = add1(x) in add1(y)
Recall: What does a Compiler look like?

An input source program is converted to an executable binary in many stages:
- Parsed into a data structure called an Abstract Syntax Tree
- Checked to make sure code is well-formed (and well-typed)
- Simplified into some convenient Intermediate Representation
- Optimized into (equivalent) but faster program
- Generated into assembly
x86 - Linked against a run-time (usually written in C)
Simplified Pipeline
Goal: Compile source into executable that, when run, prints the result of evaluating the source.
Approach: Lets figure out how to write
- A compiler from the input string into assembly,
- A run-time that will let us do the printing.

Next, lets see how to do (1) and (2) using our sequence of adder languages.
Adder-1
- Numbers
- e.g.
7,12,42…
The “Run-time”
Lets work backwards and start with the run-time.
Here’s what it looks like as a C program main.c
#include <stdio.h>
extern int our_code() asm("our_code_label");
int main(int argc, char** argv) {
int result = our_code();
printf("%d\n", result);
return 0;
}mainjust callsour_codeand prints its return value,our_codeis (to be) implemented in assembly,- Starting at label
our_code_label, - With the desired return value stored in register
EAX - per, the
Ccalling convention
- Starting at label
Test Systems in Isolation
Key idea in SW-Engg:
Decouple systems so you can test one component without (even implementing) another.
Lets test our “run-time” without even building the compiler.
Testing the Runtime: A Really Simple Example
Given a SourceProgram
42
We want to compile the above into an assembly file forty_two.s that looks like:
For now, lets just
- write that file by hand, and test to ensure
- object-generation and then
- linking works
On a Mac use -f macho instead of -f aout
We can now run it:
Hooray!
The “Compiler”
Recall, that compilers were invented to avoid writing assembly by hand
First Step: Types
To go from source to assembly, we must do:

Our first step will be to model the problem domain using types.

Lets create types that represent each intermediate value:
Textfor the raw input sourceExprfor the ASTAsmfor the output x86 assembly
Defining the Types: Text
Text is raw strings, i.e. sequences of characters
texts :: [Text]
texts =
[ "It was a dark and stormy night..."
, "I wanna hold your hand..."
, "12"
]Defining the Types: Expr
We convert the Text into a tree-structure defined by the datatype
Note: As we add features to our language, we will keep adding cases to Expr.
Defining the Types: Asm
Lets also do this gradually as the x86 instruction set is HUGE!
Recall, we need to represent
An Asm program is a list of instructions each of which can:
- Create a
Label, or - Move a
Arginto aRegister Returnback to the run-time.
Where we have
Second Step: Transforms
Ok, now we just need to write the functions:
parse :: Text -> Expr -- 1. Transform source-string into AST
compile :: Expr -> Asm -- 2. Transform AST into assembly
asm :: Asm -> Text -- 3. Transform assembly into output-stringPretty straightforward:
parse :: Text -> Expr
parse = parseWith expr
where
expr = integer
compile :: Expr -> Asm
compile (Number n) =
[ IMov (Reg EAX) (Const n)
, IRet
]
asm :: Asm -> Text
asm is = L.intercalate "\n" [instr i | i <- is]Where instr is a Text representation of each Instruction
instr :: Instruction -> Text
instr (IMov a1 a2) = printf "mov %s, %s" (arg a1) (arg a2)
arg :: Arg -> Text
arg (Const n) = printf "%d" n
arg (Reg r) = reg r
reg :: Register -> Text
reg EAX = "eax"Brief digression: Typeclasses
Note that above we have four separate functions that crunch different types to the Text representation of x86 assembly:
Remembering names is hard.
We can write an overloaded function, and let the compiler figure out the correct implementation from the type, using Typeclasses.
The following defines an interface for all those types a that can be converted to x86 assembly:
class ToX86 a where
asm :: a -> Text
Now, to overload, we say that each of the types Asm, Instruction, Arg and Register implements or has an instance of ToX86
instance ToX86 Asm where
asm is = L.intercalate "\n" [asm i | i <- is]
instance ToX86 Instruction where
asm (IMov a1 a2) = printf "mov %s, %s" (asm a1) (asm a2)
instance ToX86 Arg where
asm (Const n) = printf "%d" n
arg (Reg r) = asm r
instance ToX86 Register where
asm EAX = "eax"Note in each case above, the compiler figures out the correct implementation, from the types…
Adder-2
Well that was easy! Lets beef up the language!
- Numbers + Increment
- e.g.
add1(7),add1(add1(12)), …
Repeat our Recipe
- Build intuition with examples,
- Model problem with types,
- Implement compiler via type-transforming-functions,
- Validate compiler via tests.
1. Examples
First, lets look at some examples.
Example 1
How should we compile?
add1(7)
In English
- Move
7into theeaxregister - Add
1to the contents ofeax
In ASM
Aha, note that add is a new kind of Instruction
Example 2
How should we compile
add1(add1(12))
In English
- Move
12into theeaxregister - Add
1to the contents ofeax - Add
1to the contents ofeax
In ASM
Compositional Code Generation
Note correspondence between sub-expressions of source and assembly

We will write compiler in compositional manner
- Generating
Asmfor each sub-expression (AST subtree) independently, - Generating
Asmfor super-expression, assuming the value of sub-expression is inEAX
2. Types
Next, lets extend the types to incorporate new language features
Extend Type for Source and Assembly
Source Expressions
Assembly Instructions
Examples Revisited
src2 = "add1(add1(12))"
exp2 = Add1 (Add1 (Number 12))
asm2 = [ IMov (EAX) (Const 7)
, IAdd (EAX) (Const 1)
]3. Transforms
Now lets go back and suitably extend the transforms:
parse :: Text -> Expr -- 1. Transform source-string into AST
compile :: Expr -> Asm -- 2. Transform AST into assembly
asm :: Asm -> Text -- 3. Transform assembly into output-stringLets do the easy bits first, namely parse and asm
Parse
parse :: Text -> Expr
parse = parseWith expr
expr :: Parser Expr
expr = try primExpr
<|> integer
primExpr :: Parser Expr
primExpr = Add1 <$> rWord "add1" *> parens exprAsm
To update asm just need to handle case for IAdd
instance ToX86 Instruction where
asm (IMov a1 a2) = printf "mov %s, %s" (asm a1) (asm a2)
asm (IAdd a1 a2) = printf "add %s, %s" (asm a1) (asm a2)Note
- GHC will tell you exactly which functions need to be extended (Types, FTW!)
- We will not discuss
parseandasmany more…
Compile
Finally, the key step is
compile :: Expr -> Asm
compile (Number n)
= [ IMov (Reg EAX) (Const n)
, IRet
]
compile (Add1 e)
= compile e -- EAX holds value of result of `e` ...
++ [ IAdd (Reg EAX) (Const 1) ] -- ... so just increment it.Examples Revisited
Lets check that compile behaves as desired:
ghci> (compile (Number 12)
[ IMov (Reg EAX) (Const 12) ]
ghci> compile (Add1 (Number 12))
[ IMov (Reg EAX) (Const 12)
, IAdd (Reg EAX) (Const 1)
]
ghci> compile (Add1 (Add1 (Number 12)))
[ IMov (Reg EAX) (Const 12)
, IAdd (Reg EAX) (Const 1)
, IAdd (Reg EAX) (Const 1)
]Adder-3
You do it!
- Numbers + Increment + Double
- e.g.
add1(7),twice(add1(12)),twice(twice(add1(42)))
Adder-4
- Numbers + Increment + Decrement + Local Variables
- e.g.
let x = add1(7), y = add1(x) in add1(y)
Local Variables
Local variables make things more interesting
Repeat our Recipe
- Build intuition with examples,
- Model problem with types,
- Implement compiler via type-transforming-functions,
- Validate compiler via tests.
Step 1: Examples
Lets look at some examples
Example: let1
Need to store 1 variable – x
Example: let2
Need to store 3 variable – x, y, z
Example: let3
Need to store 3 variables – a, b, c – but at most 2 at a time
- First
a, b, thena, c - Don’t need
bandcsimultaneously
Registers are Not Enough
A single register eax is useless:
- May need 2 or 3 or 4 or 5 … values.
There is only a fixed number (say, N) of registers:
- And our programs may need to store more than
Nvalues, so - Need to dig for more storage space!
Memory: Code, Globals, Heap and Stack
Here’s what the memory – i.e. storage – looks like:

Focusing on “The Stack”
Lets zoom into the stack region, which when we start looks like this:

The stack grows downward (i.e. to smaller addresses)
We have lots of 4-byte slots on the stack at offsets from the “stack pointer” at addresses:
[EBP - 4 * 1],[EBP - 4 * 2], …,
How to compute mapping from variables to slots ?
The i-th stack-variable lives at address [EBP - 4 * i]
Required A mapping
- From source variables (
x,y,z…) - To stack positions (
1,2,3…)
Solution The structure of the lets is stack-like too…
- Maintain an
Envthat mapsId |-> StackPosition let x = e1 in e2addsx |-> itoEnv- where
iis current height of stack.
- where
Example: Let-bindings and Stacks
let x = 1 -- []
, y = add1(x) -- [x |-> 1]
, z = add1(y) -- [y |-> 2, x |-> 1]
in
add1(z) -- [z |- 3, y |-> 2, x |-> 1]QUIZ
At what position on the stack do we store variable c ?
-- []
let a = 1 -- [a |-> 1]
, c = -- [a |-> 1]
let b = add1(a) -- [b |-> 2, a |-> 1]
in add1(b) -- [a |-> 1]
-- [c |-> 2, a |-> 1]
in
add1(c)A. 1 B. 2 C. 3 D. 4 E. not on stack!
haskell "ENVIRONMENT" -- [] let a = 1 -- [a |-> 1] , b = 2 -- [b |-> 2, a |-> 1] , c = -- let b = add1(a) -- [b |-> 3, b -> 2, a |-> 1] in add1(b) -- -- [b |-> 2, a |-> 1] in add1(b)
Strategy
At each point, we have env that maps (previously defined) Id to StackPosition
Variable Use
To compile x given env
- Move
[ebp - 4 * i]intoeax
(where env maps x |-> i)
Variable Definition
To compile let x = e1 in e2 we
- Compile
e1usingenv(i.e. resulting value will be stored ineax) - Move
eaxinto[ebp - 4 * i] - Compile
e2usingenv'
(where env' be env with x |-> i i.e. push x onto env at position i)
Example: Let-bindings to Asm
Lets see how our strategy works by example:
Example: let1

QUIZ: let2
When we compile
The assembly looks like
mov eax, 10 ; LHS of let x = 10
mov [ebp - 4*1], eax ; save x on the stack
mov eax, [ebp - 4*1] ; LHS of , y = add1(x)
add eax, 1 ; ""
???
add eax, 1What .asm instructions shall we fill in for ???
mov [ebp - 4 * 1], eax ; A
mov eax, [ebp - 4 * 1]
mov [ebp - 4 * 1], eax ; B
mov [ebp - 4 * 2], eax ; C
mov [ebp - 4 * 2], eax ; D
mov eax, [ebp - 4 * 2]
; E (empty! no instructions)Example: let3
Lets compile
Lets figure out what the assembly looks like!
Step 2: Types
Now, we’re ready to move to the implementation!
Lets extend the types for Source Expressions
type Id = Text
data Expr = ...
| Let Id Expr Expr -- `let x = e1 in e2` modeled as is `Let x e1 e2`
| Var Id Lets enrich the Instruction to include the register-offset [ebp - 4*i]
Environments
Lets create a new Env type to track stack-positions of variables
data Env = [(Id, Int)]
data Maybe a = Nothing | Just a
lookupEnv :: Env -> Id -> Maybe Int
lookupEnv [] x = Nothing
lookupEnv ((y, n) : rest) x = if x == y
then Just n
else lookupEnv rest x
pushEnv :: Env -> Id -> (Int, Env)
pushEnv env x = (xn , env')
where
env' = (x, xn) : env
xn = 1 + length env
compile env (Let x e1 e2) =
compile env e1
++ -- EAX hold the value of "x"
[IMov (RegOffset EBP xn) EAX ]
++
compile env' e2
where
(xn, env') = pushEnv env x
compile env (Var x) = [IMov EAX (RegOffset EBP xn)]
where
xn = case lookupEnv env x of
Just n -> n
Nothing -> error "variable out of scope"API:
- Push variable onto
Env(returning its position), - Lookup variable’s position in
Env
push :: Id -> Env -> (Int, Env)
push x env = (i, (x, i) : env)
where
i = 1 + length env
lookup :: Id -> Env -> Maybe Int
lookup x [] = Nothing
lookup x ((y, i) : env)
| x == y = Just i
| otherwise = lookup x env Step 3: Transforms
Ok, now we’re almost done. Just add the code formalizing the above strategy
Code
Variable Use
compileEnv env (Var x) = [ IMov (Reg EAX) (RegOffset EBP i) ]
where
i = fromMaybe err (lookup x env)
err = error (printf "Error: Variable '%s' is unbound" x)Variable Definition
compileEnv env (Let x e1 e2 l) = compileEnv env e1
++ IMov (RegOffset EBP i) (Reg EAX)
: compileEnv env' e2
where
(i, env') = pushEnv x envStep 4: Tests
Lets take our adder compiler out for a spin!
Recap: We just wrote our first Compilers
SourceProgram will be a sequence of four tiny “languages”
- Numbers
- e.g.
7,12,42…
- Numbers + Increment
- e.g.
add1(7),add1(add1(12)), …
- Numbers + Increment + Decrement
- e.g.
add1(7),add1(add1(12)),sub1(add1(42))
- Numbers + Increment + Decrement + Local Variables
- e.g.
let x = add1(7), y = add1(x) in add1(y)
Using a Recipe
- Build intuition with examples,
- Model problem with types,
- Implement compiler via type-transforming-functions,
- Validate compiler via tests.
Will iterate on this till we have a pretty kick-ass language.