Skip to content

Commit 7d951a2

Browse files
authored
add custom exercise assemble (#184)
1 parent 48d95d5 commit 7d951a2

14 files changed

Lines changed: 1074 additions & 0 deletions

File tree

config.json

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -809,6 +809,14 @@
809809
"practices": [],
810810
"prerequisites": [],
811811
"difficulty": 9
812+
},
813+
{
814+
"slug": "assemble",
815+
"name": "assemble",
816+
"uuid": "17337704-ef28-490b-bbdd-f8e3f51e326b",
817+
"practices": [],
818+
"prerequisites": [],
819+
"difficulty": 10
812820
}
813821
]
814822
},
Lines changed: 141 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,141 @@
1+
# Instructions
2+
3+
In this exercise, you will define the syntax and execution model for a small assembly-like language inspired by the `x86-64` assembly language.
4+
5+
The goal is to formalize how code is written, parsed, and executed within a constrained environment.
6+
7+
## Execution Model
8+
9+
Your assembly code will be written inside a special construct:
10+
11+
```lean
12+
let program := assemble!(
13+
-- assembly code here
14+
)
15+
```
16+
17+
This construct must produce a **program**, which simulates a function that executes the assembly code and returns a value.
18+
This program is then invoked using the following syntax, where `a`, `b`, `c`, etc. are arguments:
19+
20+
```lean
21+
let result := program(a, b, c, ...)
22+
```
23+
24+
All arguments and the return value are _64-bit signed integers_.
25+
A maximum of _6 arguments_ is allowed.
26+
27+
## Registers
28+
29+
The return value and all arguments are passed in **registers**.
30+
Your language must support the following registers:
31+
32+
| register | role |
33+
|----------|--------------|
34+
| `rdi` | 1st argument |
35+
| `rsi` | 2nd argument |
36+
| `rdx` | 3rd argument |
37+
| `rcx` | 4th argument |
38+
| `r8` | 5th argument |
39+
| `r9` | 6th argument |
40+
| `rax` | return value |
41+
42+
Registers start with a default value of `0`, unless they are used to pass arguments to the function.
43+
44+
For example, in the following program, all registers have a value of `0` with the exception of `rdi`, which is initialized with `10`, and `rsi`. which is initialized with `-20`:
45+
46+
```lean
47+
program(10, -20)
48+
```
49+
50+
The return value of the program is always stored in `rax`.
51+
52+
Note that register names are _case-insensitive_.
53+
This means that `rax`, `RAX` and `rAx` all refer to the same register.
54+
55+
## Assembly code
56+
57+
Each line in the assembly code can be of two forms:
58+
59+
- **Labels** mark specific places of the program to be used by some instructions.
60+
- **Instructions** represent operations executed by the program.
61+
62+
Unless modified by some instruction, the execution flow of the program proceeds linearly.
63+
This means that a previous line is fully executed before the line after it is executed.
64+
65+
### Labels
66+
67+
Labels have the following syntax:
68+
69+
```lean
70+
<label>:
71+
```
72+
73+
They do not alter the values of any register or have any effect to the program other than marking specific places of the code so they can be used by instructions.
74+
75+
Note that labels are _case-sensitive_.
76+
This means that `Start`, `start` and `sTart` are all different labels.
77+
78+
### Two-operand instructions
79+
80+
Most instructions have the following syntax:
81+
82+
```lean
83+
<opcode> <destination>, <source>
84+
```
85+
86+
The `opcode` indicates the operation being executed, using the values of the `destination` and `source` operands.
87+
The result of the operation is stored in the `destination` operand.
88+
89+
For example, the instruction `add` sums the values in the `destination` and the `source` operands and stores this result in the `destination` operand.
90+
91+
The `destination` operand is _always_ a register, whereas the `source` operand may be a register or a literal number.
92+
93+
This is a list of instructions your program must support:
94+
95+
| opcode | operation performed |
96+
|--------|---------------------------------------|
97+
| `mov` | destination := source |
98+
| `add` | destination := destination + source |
99+
| `sub` | destination := destination - source |
100+
| `mul` | destination := destination * source |
101+
| `div` | destination := destination / source |
102+
| `xor` | destination := destination ^^^ source |
103+
| `and` | destination := destination &&& source |
104+
| `or` | destination := destination ||| source |
105+
| `shl` | destination := destination <<< source |
106+
| `shr` | destination := destination >>> source |
107+
108+
Other than those, your program must support the two-operand `cmp` instruction.
109+
110+
This instruction does not modify any register, but instead compares the `destination` and the `source` operands and sets an internal state of the program to one of three states:
111+
112+
- _greater than_, if `destination` > `source`
113+
- _equal_, if `destination` == `source`
114+
- _lesser than_, if `destination` < `source`
115+
116+
How you keep track internally of this state is up to you.
117+
118+
### One-operand instructions
119+
120+
There are some instructions that alter the flow of the program, transferring execution to another point of the code.
121+
They are called _jumping_ instructions.
122+
123+
They all take just one operand, which is a label that indicates the target of the jump, i.e., the point of the code where execution will continue:
124+
125+
```lean
126+
<opcode> <label>
127+
```
128+
129+
The `jmp` instruction _always_ makes the jump to the target label.
130+
131+
Other jumping instructions make the jump only if the internal state of the program is set to a specific value.
132+
This means they are usually preceded by a `cmp` instruction.
133+
134+
Those are the jumping instructions your program must support:
135+
136+
| instruction | operation performed |
137+
|-------------|----------------------------------------------------|
138+
| `jmp` | unconditional jump. The jump is always performed |
139+
| `je` | jumps if the internal state is _equal_ |
140+
| `jl` | jumps if the internal state is _less than_ |
141+
| `jg` | jumps if the internal state is _greater than_ |
Lines changed: 157 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,157 @@
1+
import Std.Data.HashMap
2+
3+
open Lean Syntax Macro Std
4+
5+
structure CPU where
6+
regs : Vector Int64 10
7+
deriving Repr
8+
9+
structure Program where
10+
labels : HashMap String Nat
11+
ops : Array (List String)
12+
13+
private def getStr (id : TSyntax `ident) : String :=
14+
id.getId.toString.toLower
15+
16+
private def regIdx (reg : String) : Fin 10 :=
17+
match reg with
18+
| "rdi" => 0
19+
| "rsi" => 1
20+
| "rdx" => 2
21+
| "rcx" => 3
22+
| "r8" => 4
23+
| "r9" => 5
24+
| "r10" => 6
25+
| "r11" => 7
26+
| "rax" => 8
27+
| "rflags" => 9
28+
| _ => panic! "invalid register"
29+
30+
private def getVal (cpu : CPU) (reg : String) : Int64 :=
31+
match reg.toInt? with
32+
| some n => n.toInt64
33+
| none => cpu.regs[regIdx reg]!
34+
35+
private def unFlag : Int64 := 1
36+
private def eqFlag : Int64 := (1 : Int64) <<< 6
37+
private def gtFlag : Int64 := (1 : Int64) <<< 5
38+
private def ltFlag : Int64 := (1 : Int64) <<< 4
39+
40+
private def compareRegs (dest src : Int64) : Int64 := Id.run do
41+
let mut acc : Int64 := 0
42+
acc := acc ||| ((dest == src).toInt64 <<< 6) -- equal
43+
acc := acc ||| ((decide (dest > src)).toInt64 <<< 5) -- greater
44+
acc := acc ||| ((decide (dest < src)).toInt64 <<< 4) -- lesser
45+
return acc
46+
47+
private def dispatchCondJmp (cpu : CPU) (dest : String) (labels : Std.HashMap String Nat) (flag : Int64) (RIP : Nat) : Nat :=
48+
if flag == unFlag || getVal cpu "rflags" &&& flag != 0 then
49+
if h₀: dest ∈ labels then
50+
labels[dest] + 1
51+
else panic! "Invalid label"
52+
else RIP + 1
53+
54+
declare_syntax_cat x86_dest
55+
syntax ident : x86_dest
56+
57+
declare_syntax_cat x86_src
58+
syntax ident : x86_src
59+
syntax num : x86_src
60+
syntax "-" num : x86_src
61+
62+
declare_syntax_cat x86_inst
63+
syntax ident x86_dest "," x86_src : x86_inst
64+
syntax ident x86_dest : x86_inst
65+
syntax ident ":" : x86_inst
66+
67+
syntax "assemble!(" (x86_inst)* ")" : term
68+
syntax term "(" (term),* ")" : term
69+
70+
private def Program.evaluate (args : Array Int64) (program : Program) : Int64 := Id.run do
71+
let regs := (Array.finRange 6).foldl (init := Vector.replicate 10 0) fun acc i =>
72+
acc.set i (args[i]?.getD 0)
73+
74+
let mut acc : CPU := { regs }
75+
let mut i := 0
76+
77+
while h: i < program.ops.size do
78+
match program.ops[i] with
79+
| [op, dest, src] =>
80+
i := i + 1
81+
let srcVal := getVal acc src
82+
let destVal := getVal acc dest
83+
match op with
84+
| "mov" => acc := { acc with regs := acc.regs.set (regIdx dest) srcVal }
85+
| "add" => acc := { acc with regs := acc.regs.set (regIdx dest) (destVal + srcVal) }
86+
| "sub" => acc := { acc with regs := acc.regs.set (regIdx dest) (destVal - srcVal) }
87+
| "xor" => acc := { acc with regs := acc.regs.set (regIdx dest) (destVal ^^^ srcVal) }
88+
| "and" => acc := { acc with regs := acc.regs.set (regIdx dest) (destVal &&& srcVal) }
89+
| "or" => acc := { acc with regs := acc.regs.set (regIdx dest) (destVal ||| srcVal) }
90+
| "shl" => acc := { acc with regs := acc.regs.set (regIdx dest) (destVal <<< srcVal) }
91+
| "shr" => acc := { acc with regs := acc.regs.set (regIdx dest) (destVal >>> srcVal) }
92+
| "mul" => acc := { acc with regs := acc.regs.set (regIdx dest) (destVal * srcVal) }
93+
| "div" => acc := { acc with regs := acc.regs.set (regIdx dest) (destVal / srcVal) }
94+
| "cmp" => acc := { acc with regs := acc.regs.set (regIdx "rflags") (compareRegs destVal srcVal) }
95+
| _ => continue
96+
| [op, dest] =>
97+
match op with
98+
| "jmp" => i := dispatchCondJmp acc dest program.labels unFlag i
99+
| "je" => i := dispatchCondJmp acc dest program.labels eqFlag i
100+
| "jl" => i := dispatchCondJmp acc dest program.labels ltFlag i
101+
| "jg" => i := dispatchCondJmp acc dest program.labels gtFlag i
102+
| _ => i := i + 1
103+
| _ => i := i + 1
104+
105+
return acc.regs.get $ regIdx "rax"
106+
107+
macro_rules
108+
| `(assemble!( $[$insts]* )) => do
109+
let mut labelNames : Array (TSyntax `term) := #[]
110+
let mut labelIndices : Array (TSyntax `term) := #[]
111+
let mut ops : Array (TSyntax `term) := #[]
112+
113+
for i in [:insts.size] do
114+
match insts[i]! with
115+
| `(x86_inst| $opId:ident $dest:x86_dest , $src:x86_src) =>
116+
match dest with
117+
| `(x86_dest| $destId:ident) =>
118+
match src with
119+
| `(x86_src| $srcId:ident) =>
120+
let opStr := mkStrLit $ getStr opId
121+
let destStr := mkStrLit $ getStr destId
122+
let srcStr := mkStrLit $ getStr srcId
123+
ops := ops.push (← `([$opStr, $destStr, $srcStr]))
124+
| `(x86_src| $srcN:num) =>
125+
let opStr := mkStrLit $ getStr opId
126+
let destStr := mkStrLit $ getStr destId
127+
let srcStr := mkStrLit $ s!"{srcN.getNat}"
128+
ops := ops.push (← `([$opStr, $destStr, $srcStr]))
129+
| `(x86_src| -$srcN:num) =>
130+
let opStr := mkStrLit $ getStr opId
131+
let destStr := mkStrLit $ getStr destId
132+
let srcStr := mkStrLit $ s!"-{srcN.getNat}"
133+
ops := ops.push (← `([$opStr, $destStr, $srcStr]))
134+
| _ => throwUnsupported
135+
| _ => throwUnsupported
136+
| `(x86_inst| $opId:ident $dest:x86_dest) =>
137+
match dest with
138+
| `(x86_dest| $destId:ident) =>
139+
let opStr := mkStrLit $ getStr opId
140+
let destStr := mkStrLit destId.getId.toString
141+
ops := ops.push (← `([$opStr, $destStr]))
142+
| _ => throwUnsupported
143+
| `(x86_inst| $labelId:ident :) =>
144+
labelNames := labelNames.push (mkStrLit labelId.getId.toString)
145+
labelIndices := labelIndices.push (mkNumLit s!"{i}")
146+
ops := ops.push (← `([]))
147+
| _ => throwUnsupported
148+
149+
`(({
150+
ops := #[$[$ops],*],
151+
labels := HashMap.ofList [ $[ ($labelNames, $labelIndices) ],* ]
152+
} : Program)
153+
)
154+
| `($program( $[$args],* )) => do
155+
let args' ← args.mapM fun a => `(($a : Int64))
156+
let result ← `(Program.evaluate #[$[$args'],*] $program)
157+
`(($result))
Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,18 @@
1+
{
2+
"authors": [
3+
"oxe-i"
4+
],
5+
"files": {
6+
"solution": [
7+
"Assemble.lean"
8+
],
9+
"test": [
10+
"AssembleTest.lean"
11+
],
12+
"example": [
13+
".meta/Example.lean"
14+
]
15+
},
16+
"icon": "binary",
17+
"blurb": "Define the syntax of a minimal x86-64–inspired assembly language"
18+
}

0 commit comments

Comments
 (0)