The Neural Network That Became a CPU - Development History
A FLYNNCOMM, LLC Production
"You keep using that transformer. I do not think it computes what you think it computes."
Yeah, it computes EXACTLY what we think it computes.
| Metric | Value |
|---|---|
| Start Date | 2024-12-12 |
| Completion Date | 2024-12-12 |
| Neural Organs | 6 |
| Total Combinations Verified | 460,928 |
| Errors | 0 |
| Final Accuracy | 100.000% |
Status: โ COMPLETE (100% Accuracy)
The ALU performs addition and subtractionโthe foundation of all computation. The challenge: teach a neural network to compute 8-bit arithmetic with carry, producing correct results AND correct status flags (N, V, Z, C) for all 131,072 possible inputs.
Input Layer:
โโโ A register (Soroban encoded): 32 features
โโโ Operand (Soroban encoded): 32 features
โโโ Carry In: 1 feature
Total: 65 input features
Hidden Layers:
โโโ Linear(65 โ 512) + ReLU
โโโ Linear(512 โ 512) + ReLU
Output Heads:
โโโ Result Head: Linear(512 โ 128 โ 32) + Sigmoid
โโโ N Flag Head: Linear(512 โ 32 โ 1) + Sigmoid
โโโ Z Flag Head: Linear(512 โ 32 โ 1) + Sigmoid [3x loss weight]
โโโ C Flag Head: Linear(512 โ 32 โ 1) + Sigmoid
โโโ V Flag Head: Linear(512 โ 32 โ 1) + Sigmoid
Traditional binary struggles with carry propagation. Soroban (thermometer) encoding makes carries spatially visible:
Decimal 37 in Soroban (4 rods ร 8 beads):
Rod 0 (1s): โโโโโโโโ = 7
Rod 1 (10s): โโโโโโโโ = 3
Rod 2 (100s): โโโโโโโโ = 0
Rod 3 (1000s):โโโโโโโโ = 0
When adding, carry "ripples" through adjacent rod representations,
making it learnable as a spatial pattern.
-
Z-Flag Trap: Networks predict ~0.0001 instead of exactly 0
- Solution: Dedicated Z head with 3x loss weight
- Post-processing: If Z > 0.5, force result = 0
-
Carry Balance: Initial training had C_in=0 bias
- Solution: Enforce 50% C_in=1 in training data
-
Zero Oversampling: Result=0 cases are rare (512/131072 = 0.4%)
- Solution: 10x oversample zero-result cases
Dataset: 131,072 exhaustive combinations (256 ร 256 ร 2)
Oversampled: ~300,000 with zero case emphasis
Epochs: 50
Batch Size: 2048
Optimizer: Adam (lr=0.001 โ 0.0005)
Loss: BCE with 3x weight on Z flag
Tested: ALL 131,072 combinations
Errors: 0
Accuracy: 100.0000%
Status: โ COMPLETE (100% Accuracy)
- ASL (Arithmetic Shift Left): Shift left, bit 7 โ Carry
- LSR (Logical Shift Right): Shift right, bit 0 โ Carry
- ROL (Rotate Left): Shift left, Carry โ bit 0, bit 7 โ Carry
- ROR (Rotate Right): Shift right, Carry โ bit 7, bit 0 โ Carry
Input: 13 features
โโโ Value (8 bits)
โโโ Carry In (1 bit)
โโโ Operation (4-bit one-hot)
Hidden: 256 โ 256 โ 128 (ReLU)
Output: 11 features
โโโ Result (8 bits)
โโโ N flag
โโโ Z flag
โโโ C flag
Dataset: 1,536 unique combinations (256 ร 2 ร 4 - some C_in irrelevant)
Oversampled: 76,800 (50x)
Epochs: 20
Accuracy: 100%
Status: โ COMPLETE (100% Accuracy)
- AND: Bitwise AND
- ORA: Bitwise OR
- EOR: Bitwise XOR (Exclusive OR)
- BIT: Bit test (affects flags only)
Input: 20 features
โโโ A register (8 bits)
โโโ Operand (8 bits)
โโโ Operation (4-bit one-hot)
Hidden: 256 โ 256 โ 128 (ReLU)
Output: 11 features
โโโ Result (8 bits)
โโโ N flag
โโโ Z flag
โโโ V flag (BIT only)
Dataset: 262,144 exhaustive combinations (256 ร 256 ร 4)
Epochs: 50
Accuracy: 100%
Status: โ COMPLETE (100% Accuracy)
- INC: Increment memory
- DEC: Decrement memory
- INX/INY: Increment X/Y register
- DEX/DEY: Decrement X/Y register
Input: 9 features
โโโ Value (8 bits)
โโโ Is Decrement (1 bit)
Hidden: 256 โ 256 โ 128 (ReLU)
Output: 10 features
โโโ Result (8 bits)
โโโ N flag
โโโ Z flag
Dataset: 512 unique combinations (256 ร 2)
Oversampled: 75,200 (100x + boundary emphasis)
Epochs: 20
Accuracy: 100%
Status: โ COMPLETE (100% Accuracy)
- CMP: Compare A with memory
- CPX: Compare X with memory
- CPY: Compare Y with memory
Compare performs subtraction without storing result, only affecting flags:
- N = bit 7 of (register - operand)
- Z = 1 if register == operand
- C = 1 if register >= operand
Input: 16 features
โโโ Register value (8 bits)
โโโ Operand (8 bits)
Hidden: 512 โ 256 โ 128 (ReLU)
Output: 3 features
โโโ N flag
โโโ Z flag
โโโ C flag
Dataset: 65,536 exhaustive combinations (256 ร 256)
Epochs: 40
Accuracy: 100%
Status: โ COMPLETE (100% Accuracy)
- BPL: Branch if Plus (N=0)
- BMI: Branch if Minus (N=1)
- BVC: Branch if Overflow Clear (V=0)
- BVS: Branch if Overflow Set (V=1)
- BCC: Branch if Carry Clear (C=0)
- BCS: Branch if Carry Set (C=1)
- BNE: Branch if Not Equal (Z=0)
- BEQ: Branch if Equal (Z=1)
Input: 12 features
โโโ N flag (1 bit)
โโโ V flag (1 bit)
โโโ Z flag (1 bit)
โโโ C flag (1 bit)
โโโ Branch type (8-bit one-hot)
Hidden: 64 โ 32 (ReLU)
Output: 1 feature (take branch: yes/no)
Dataset: 128 exhaustive combinations (16 flag states ร 8 branch types)
Oversampled: 12,800 (100x)
Epochs: 10
Accuracy: 100%
Status: โ COMPLETE
All neural organs integrated into the CPU class:
class CPU:
def __init__(self, weights_dir):
self.alu = ALUOrgan(hidden_dim=512)
self.shift = ShiftOrgan()
self.logic = LogicOrgan()
self.incdec = IncDecOrgan()
self.compare = CompareOrgan()
self.branch = BranchOrgan()
# Load pretrained weights...| Test | Expected | Actual | Status |
|---|---|---|---|
| 37 + 26 | 63 | 63 | โ |
| 0xFF AND 0x0F | 0x0F | 0x0F | โ |
| 0xF0 ORA 0x0F | 0xFF | 0xFF | โ |
| 0xFF EOR 0x0F | 0xF0 | 0xF0 | โ |
| 0x40 ASL | 0x80 | 0x80 | โ |
| 0x80 LSR | 0x40 | 0x40 | โ |
| 0x80 ROL (C=1) | 0x01 | 0x01 | โ |
| 0x01 ROR (C=1) | 0x80 | 0x80 | โ |
| 5 INX INX | 7 | 7 | โ |
| 16 DEY DEY DEY | 13 | 13 | โ |
| Fibonacci(10) | 144 | 144 | โ |
| 7 ร 13 | 91 | 91 | โ |
| Organ | Parameters | Size | Combinations | Accuracy |
|---|---|---|---|---|
| ALU | ~800K | 1.7MB | 131,072 | 100% |
| SHIFT | ~200K | 418KB | 1,536 | 100% |
| LOGIC | ~200K | 425KB | 262,144 | 100% |
| INCDEC | ~200K | 413KB | 512 | 100% |
| COMPARE | ~350K | 696KB | 65,536 | 100% |
| BRANCH | ~5K | 15KB | 128 | 100% |
| Total | ~1.75M | 3.7MB | 460,928 | 100% |
f12b781 Add spectacular Neural 6502 demo
fb7c68b Wire up ALL neural organs - Full Neural 6502 operational!
29db39a Neural 6502: ALL 6 ORGANS AT 100% ACCURACY
63fe841 Add comprehensive training infrastructure and documentation
b4b7b1d Neural 6502: TRUE 100% ACCURACY
5242092 Neural 6502: First working version
dcebbf9 Add Neural 6502 spec sheet for VGem and Vi
35d34b8 Add Pretrained Neural 6502 model with support for two-be weights
4b2e91b Add Neural 6502 demo with training example
bb5eab1 Implement Neural 6502 CPU Emulator with specialized organs
neural6502/
โโโ __init__.py # Package initialization
โโโ cpu.py # Main CPU class (700+ lines)
โโโ memory.py # 64KB RAM + memory-mapped I/O
โโโ soroban.py # Thermometer encoding utilities
โโโ demo.py # Interactive demonstration
โโโ README.md # User documentation
โโโ BUILD_LOG.md # This file
โโโ organs/
โ โโโ __init__.py # Organ exports
โ โโโ alu.py # Neural ALU (ADC, SBC)
โ โโโ shift.py # Neural SHIFT (ASL, LSR, ROL, ROR)
โ โโโ logic.py # Neural LOGIC (AND, ORA, EOR, BIT)
โ โโโ incdec.py # Neural INCDEC (INC, DEC)
โ โโโ compare.py # Neural COMPARE (CMP, CPX, CPY)
โ โโโ branch.py # Neural BRANCH (all 8 conditionals)
โโโ training/
โ โโโ __init__.py
โ โโโ data.py # Ground truth data generators
โ โโโ train_all.py # Master training script
โโโ weights/
โโโ alu.pt # Pretrained ALU (1.7MB)
โโโ shift.pt # Pretrained SHIFT (418KB)
โโโ logic.pt # Pretrained LOGIC (425KB)
โโโ incdec.pt # Pretrained INCDEC (413KB)
โโโ compare.pt # Pretrained COMPARE (696KB)
โโโ branch.pt # Pretrained BRANCH (15KB)
- Exhaustive training: Training on every possible input guarantees correctness
- Specialized organs: Different encodings for different operation types
- Soroban encoding: Makes carry visible for arithmetic operations
- Dedicated flag heads: Separate prediction paths for each status flag
- Heavy oversampling: Critical for rare cases (zeros, boundaries)
- Single binary encoding for ALU: Couldn't learn carry propagation
- Shared flag prediction: Z-flag accuracy suffered
- Balanced training data: Zero results were underrepresented
- Small models: Needed bigger hidden dimensions for complex patterns
- Neural networks CAN do exact computation with proper architecture and training
- Encoding matters: The right representation makes learning possible
- Exhaustive verification is essential: Random sampling misses edge cases
- Specialized > General: Task-specific architectures outperform general ones
The Neural 6502 proves that neural networks can perform exact digital computation. Not approximatelyโexactly. Every single one of the 460,928 tested input combinations produces the mathematically correct output.
This isn't emulation. This isn't simulation. The neural network IS the CPU.
The weights are the logic. The inference is the computation.
"The neural network learned to be a CPU."