|
1 | | -# decaf-bytecode |
| 1 | +# Decaf: a tiny bytecode-compiled language |
| 2 | + |
| 3 | +Decaf is a lean, portfolio-friendly project that walks from source code to a working bytecode VM: |
| 4 | + |
| 5 | +- hand-written lexer, Pratt-style parser, and AST with source spans |
| 6 | +- semantic resolver with lexical scopes, mutability checks, and call validation |
| 7 | +- bytecode compiler targeting a compact stack machine |
| 8 | +- virtual machine with globals, call frames, tracing, and disassembly tooling |
| 9 | +- CLI for `compile`, `run`, and `disasm`, plus JSON serialization for bytecode artifacts |
| 10 | + |
| 11 | +## Quick start |
| 12 | + |
| 13 | +```bash |
| 14 | +# clone, then install in editable mode with dev tooling |
| 15 | +python -m pip install --upgrade pip |
| 16 | +pip install -e .[dev] |
| 17 | + |
| 18 | +# run the acceptance programs |
| 19 | +decaf run examples/sum_loop.decaf |
| 20 | + |
| 21 | +# inspect the generated bytecode |
| 22 | +decaf disasm examples/sum_loop.decaf |
| 23 | + |
| 24 | +# enable VM tracing while executing |
| 25 | +decaf run examples/sum_loop.decaf --trace |
| 26 | +``` |
| 27 | + |
| 28 | +> Working locally without installation? Prefix commands with `PYTHONPATH=src python -m decaf.cli ...` instead. |
| 29 | +
|
| 30 | +## Language snapshot |
| 31 | + |
| 32 | +- **Type system**: 32-bit signed integers only; truthiness follows C rules (0 is false, non-zero true). |
| 33 | +- **Declarations**: immutable `let` and mutable `var` at global or block scope; immutables reject reassignment. |
| 34 | +- **Expressions**: integer literals, identifiers, `+ - * /`, parentheses, call expressions. |
| 35 | +- **Statements**: expression statements, `print`, blocks, `if/else`, `while`, and `return`. |
| 36 | +- **Functions**: first-order, positional parameters, lexical scoping, required `return` on every path; `main()` is the entry point. |
| 37 | + |
| 38 | +### Grammar sketch |
| 39 | + |
| 40 | +``` |
| 41 | +program := (fnDecl | varDecl)* EOF |
| 42 | +fnDecl := "fn" IDENT "(" params? ")" block |
| 43 | +varDecl := ("let" | "var") IDENT "=" expr ";" |
| 44 | +stmt := block | "print" expr ";" | ifStmt | whileStmt | returnStmt | expr ";" |
| 45 | +expr := assignment |
| 46 | +assignment:= IDENT "=" assignment | term |
| 47 | +term := factor (("+" | "-") factor)* |
| 48 | +factor := unary (("*" | "/") unary)* |
| 49 | +unary := "-" unary | call |
| 50 | +``` |
| 51 | + |
| 52 | +## Bytecode & VM |
| 53 | + |
| 54 | +The compiler lowers functions into isolated chunks that share a constant pool. Values live on a stack; locals are addressed by slot, globals by index. |
| 55 | + |
| 56 | +| Opcode | Stack effect | Notes | |
| 57 | +| --- | --- | --- | |
| 58 | +| `PUSH_CONST c` | `+1` | push constant index `c` | |
| 59 | +| `LOAD_LOCAL i` / `STORE_LOCAL i` | `±1` | locals live in a contiguous frame | |
| 60 | +| `LOAD_GLOBAL g` / `STORE_GLOBAL g` | `±1` | globals stored in a module-wide array | |
| 61 | +| `ADD`, `SUB`, `MUL`, `DIV` | `-1` | arithmetic pops two values, pushes result (`DIV` truncates toward zero) | |
| 62 | +| `JMP addr` | `0` | unconditional branch | |
| 63 | +| `JMP_IF_FALSE addr` | `-1` | pop condition, jump when zero | |
| 64 | +| `CALL f argc` | `1-argc` | push args left→right, call function `f` | |
| 65 | +| `RET` | `-1` | pop return value, restore caller | |
| 66 | +| `PRINT` | `-1` | pop value, print decimal | |
| 67 | +| `POP` | `-1` | discard top of stack | |
| 68 | +| `HALT` | `0` | stop execution (entry chunk only) | |
| 69 | + |
| 70 | +### Disassembly snapshot |
| 71 | + |
| 72 | +``` |
| 73 | +$ decaf disasm examples/sum_loop.decaf |
| 74 | +== fn 0 main == |
| 75 | +0000 line 2 PUSH_CONST #0 (0) |
| 76 | +0003 line 2 STORE_LOCAL 0 |
| 77 | +0006 line 3 PUSH_CONST #1 (0) |
| 78 | +0009 line 3 STORE_LOCAL 1 |
| 79 | +... |
| 80 | +0053 line 8 LOAD_LOCAL 1 |
| 81 | +0056 line 8 PRINT |
| 82 | +0057 line 9 LOAD_LOCAL 1 |
| 83 | +0060 line 9 RET |
| 84 | +== fn 1 <entry> == |
| 85 | +0000 line 1 CALL 0 main argc=0 |
| 86 | +0004 line 1 POP |
| 87 | +0005 line 1 HALT |
| 88 | +``` |
| 89 | + |
| 90 | +### Trace mode |
| 91 | + |
| 92 | +``` |
| 93 | +$ decaf run examples/sum_loop.decaf --trace |
| 94 | +[trace] ip=0 fn=<entry> op=CALL stack=[<empty>] |
| 95 | +[trace] ip=0 fn=main op=PUSH_CONST stack=[0,0] |
| 96 | +[trace] ip=3 fn=main op=STORE_LOCAL stack=[0,0,0] |
| 97 | +... |
| 98 | +[trace] ip=53 fn=main op=LOAD_LOCAL stack=[...,-1,0,5,10] |
| 99 | +[trace] ip=56 fn=main op=PRINT stack=[...,-1,0,5,10,10] |
| 100 | +10 |
| 101 | +``` |
| 102 | +Trace output shows the instruction pointer, function, opcode, and the tail of the operand stack. |
| 103 | + |
| 104 | +## Examples |
| 105 | + |
| 106 | +| Program | Description | Output | |
| 107 | +| --- | --- | --- | |
| 108 | +| `examples/arithmetic.decaf` | expression precedence and return | `14` | |
| 109 | +| `examples/variables.decaf` | global mutation vs. local `let` | `13` | |
| 110 | +| `examples/if_else.decaf` | conditional execution | `5` | |
| 111 | +| `examples/sum_loop.decaf` | accumulation in a `while` loop | `10` | |
| 112 | +| `examples/factorial.decaf` | recursive function + multiplication | `120` | |
| 113 | + |
| 114 | +## Development |
| 115 | + |
| 116 | +- **Tests**: `pytest` exercises the lexer, parser, resolver, compiler, VM, and serializer. A GitHub Actions workflow runs them on every push/PR targeting `main`. |
| 117 | +- **Formatting**: the project sticks to concise, intentional comments (e.g. `#describes...`) when needed; otherwise the code aims for readability without extra tooling. |
| 118 | +- **Bytecode snapshots**: disassembler output doubles as golden coverage—update the README samples if you tweak the instruction set. |
| 119 | + |
| 120 | +### Suggested workflow |
| 121 | + |
| 122 | +```bash |
| 123 | +# run unit tests |
| 124 | +pytest |
| 125 | + |
| 126 | +# regenerate bytecode JSON for shipping artifacts |
| 127 | +decaf compile examples/sum_loop.decaf -o out/sum_loop.bc |
| 128 | + |
| 129 | +# execute the saved bytecode |
| 130 | +decaf run --bytecode out/sum_loop.bc |
| 131 | +``` |
| 132 | + |
| 133 | +## Future ideas |
| 134 | + |
| 135 | +- boolean literals and comparison opcodes that return 0/1 |
| 136 | +- simple CFG-aware optimizer (constant folding, dead-branch pruning) |
| 137 | +- bytecode verifier and max-stack precomputation per function |
| 138 | +- structured error diagnostics (source snippets, suggestions) |
| 139 | +- richer value types (strings) once the VM has a heap strategy |
0 commit comments