MyMeBot

An iOS app that runs a fine-tuned LLM entirely on-device to generate social media posts in your own voice and style. No cloud APIs, no data leaving your phone. Your AI avatar lives in your pocket.

What It Does

MyMeBot takes your existing social media posts (from Bluesky or Twitter), fine-tunes a language model to write like you, then runs that model locally on your iPhone to generate new posts on demand. The entire pipeline — from training data preparation to on-device inference — is designed to run without any server infrastructure.

Current state: Functional iOS app running a fused Qwen3-8B model (4.7GB, Q4_K_M quantization) on iPhone 17 Pro with Metal GPU acceleration. Generates posts in the user's trained voice, with full post validation and history tracking via SwiftData.

Architecture

Functional Core, Imperative Shell

The project follows a strict separation between pure logic and side effects:

MyMeBotCore (SPM Library)          App Target (iOS)
┌──────────────────────────┐      ┌──────────────────────────┐
│  Pure functions only     │      │  SwiftUI views           │
│  No UIKit/SwiftUI        │      │  URLSession networking   │
│  No file I/O             │      │  llama.cpp inference     │
│  No network calls        │      │  SwiftData persistence   │
│  183 tests, <1s runtime  │      │  Metal GPU acceleration  │
└──────────────────────────┘      └──────────────────────────┘

Why this matters: Every piece of business logic — prompt formatting, post validation, training data preparation, scheduling constraints, AT Protocol request building — is a pure function that takes inputs and returns outputs. No mocks needed for 90% of the test suite. Tests run in milliseconds, not minutes.

Module Structure

Sources/MyMeBotCore/           # Pure logic library (zero UIKit imports)
  ├── Models.swift             # Value types: InferenceConfig, ProcessedPost, UserProfile, etc.
  ├── LLMEngine.swift          # LLMEngine protocol + MockLLMEngine
  ├── PostGeneration.swift     # Prompt formatting, validation, cleaning, AT Protocol records
  ├── ChatFormatting.swift     # ChatML message formatting for Qwen3
  ├── DataPipeline.swift       # Bluesky/Twitter parsing, ChatML export, training pairs
  ├── BlueskyClient.swift      # BlueskyClient protocol + AT Protocol client
  ├── ATProtoRequests.swift    # Pure request builders + response parsers
  ├── Scheduling.swift         # Posting windows, rate limits, intent responses
  └── ModelDownload.swift      # Download state machine, model info, progress formatting

App/                           # iOS app target (imperative shell)
  ├── MyMeBotApp.swift         # @main entry point, SwiftData container
  ├── Models/
  │   └── SavedPost.swift      # SwiftData model for post history with ratings
  ├── Services/
  │   ├── LlamaCppEngine.swift # Real llama.cpp inference (Metal GPU, 30/37 layers)
  │   ├── ModelManager.swift   # Inference coordination, serial dispatch queue
  │   └── ModelDownloadService.swift  # URLSessionDownloadTask for ~5GB model
  ├── ViewModels/
  │   ├── ChatViewModel.swift  # Conversation history, ChatML formatting
  │   ├── PostGenerationViewModel.swift  # Post generation with LoRA training prompt
  │   └── ImportViewModel.swift         # Training data import
  └── Views/
      ├── ContentView.swift       # Tab-based navigation (Chat, Post, Import, Settings)
      ├── ChatView.swift          # Message bubbles, keyboard handling
      ├── GeneratePostView.swift  # One-tap post generation with save-to-history
      ├── PostHistoryView.swift   # Saved posts with good/not-good ratings
      ├── ModelDownloadView.swift # Download progress, retry logic
      ├── SettingsView.swift      # LoRA adapter toggle, preferences
      └── ImportDataView.swift    # JSONL/adapter file import

Tests/MyMeBotCoreTests/        # 183 tests across 51 suites
  ├── ModelsTests.swift
  ├── DataPipelineTests.swift
  ├── TrainingPipelineTests.swift
  ├── PostGenerationTests.swift
  ├── ChatFormattingTests.swift
  ├── SchedulingTests.swift
  ├── IntentResponseTests.swift
  ├── ModelDownloadTests.swift
  ├── ATProtoParserTests.swift
  ├── ATProtoTests.swift
  └── IntegrationTests.swift

The Model

Bobby-Qwen3-8B-Fused (Q4_K_M)

The app runs a Qwen3-8B model with LoRA fine-tuning baked directly into the weights:

Property	Value
Architecture	Qwen3 (8.2B parameters)
Quantization	Q4_K_M (4-bit, mixed precision)
File size	4.7 GB
Context window	512 tokens (reduced from 40,960 for memory)
GPU layers	30 of 37 on Metal GPU
CPU layers	7 via memory-mapped I/O
GPU memory	~4,789 MiB
CPU memory	~1,126 MiB (mmap, not resident)

Why Fused Weights Instead of Runtime LoRA

The LoRA adapters were trained on Qwen3-8B using MLX on a Mac (rank 8, scale 20.0, 25,000 iterations). Rather than loading adapters at runtime, we fused them directly into the base model weights. This means:

No adapter file management — one GGUF file contains everything
No runtime overhead — fused weights are as fast as the base model
Simpler deployment — copy one file to the device

MLX to GGUF Conversion Pipeline

The fused model started as MLX 4-bit quantized weights, which llama.cpp can't read directly. The conversion required three steps:

MLX 4-bit (fused_model_v3)
    │
    ▼  mlx_lm.convert --dequantize
bf16 safetensors (~15 GB)
    │
    ▼  convert_hf_to_gguf.py --outtype q8_0
GGUF q8_0 (8.7 GB)
    │
    ▼  llama-quantize --allow-requantize Q4_K_M
GGUF Q4_K_M (4.7 GB)  ← final model

The --allow-requantize flag is critical — llama-quantize refuses to quantize from q8_0 by default since it's already quantized. Each intermediate file was deleted between steps due to disk space constraints.

On-Device Inference

Metal GPU Acceleration

The app uses llama.cpp via the llama.swift xcframework for inference. Key configuration:

// Model loading
modelParams.use_mmap = true        // Memory-map weights (CPU layers don't consume RAM)
modelParams.n_gpu_layers = 30      // 30 of 37 layers on Metal GPU

// Context
ctxParams.n_ctx = 512              // Context window (tokens)
ctxParams.n_batch = 512            // Batch size for prompt processing
ctxParams.flash_attn_type = LLAMA_FLASH_ATTN_TYPE_DISABLED  // Stability

Memory Management

The iPhone 17 Pro has 8 GB of RAM. iOS typically allows 4-6 GB for foreground apps depending on system load. Our memory budget:

GPU (Metal): ~4,789 MiB for model weights + 58 MiB KV cache = ~4,847 MiB
CPU (mmap): ~1,126 MiB memory-mapped (not resident, paged on demand)
App overhead: ~73 MiB for Swift runtime, SwiftUI, SwiftData

The 7 CPU layers use memory-mapped I/O, which means they're paged in from the GGUF file on demand and don't count against the app's resident memory. This is how we fit an 8B model on a device with an 8 GB memory budget.

Sampling Configuration

For post generation (trained voice):

InferenceConfig(
    temperature: 0.95,    // High creativity for varied posts
    topP: 0.9,            // Nucleus sampling
    maxTokens: 280,       // Bluesky's ~300 char limit
    repeatPenalty: 1.1     // Avoid repetitive phrasing
)

For general chat:

InferenceConfig(
    temperature: 0.7,     // More focused responses
    topP: 0.9,
    maxTokens: 512,
    repeatPenalty: 1.1
)

KV Cache Management

Each generation starts by clearing the KV cache to prevent context leakage between independent requests:

guard let memory = llama_get_memory(context) else {
    throw LlamaCppError.generationFailed("Failed to get memory handle")
}
llama_memory_clear(memory, true)

The random sampling seed uses UInt32.random(in: 0...UInt32.max) instead of a timestamp-based seed to ensure different outputs on consecutive generations.

Post Generation Pipeline

Every generated post passes through a multi-stage processing pipeline, implemented entirely as composable pure functions:

User taps "Generate"
    │
    ▼
Format ChatML prompt (training prompt + /no_think + topic)
    │
    ▼
LLM inference (Qwen3-8B on Metal GPU)
    │
    ▼
Strip <think>...</think> tags (Qwen3 chain-of-thought)
    │
    ▼
Strip emoji (defense-in-depth, also instructed in prompt)
    │
    ▼
Strip hashtags (defense-in-depth, also instructed in prompt)
    │
    ▼
Trim whitespace
    │
    ▼
Validate (empty? too long? valid?)
    │
    ▼
Build AT Protocol record (ISO 8601 timestamp)
    │
    ▼
Auto-post decision (settings + validation)
    │
    ▼
Display to user with save-to-history option

Post Constraints

The prompt explicitly instructs: "Never use emoji. Never use hashtags." As defense-in-depth, the pipeline also strips emoji and hashtags from the output programmatically. The emoji stripper uses Unicode scalar properties (isEmoji with ASCII exemption for basic punctuation). The hashtag stripper uses NSRegularExpression to remove #word patterns.

Post Validation

public enum PostValidation: Equatable, Sendable {
    case valid                                          // Under 300 chars, non-empty
    case tooLong(characterCount: Int, limit: Int)       // Over limit (preserved, not truncated)
    case empty                                          // Whitespace-only or blank
    case containsProhibitedContent(reason: String)      // Future: content filtering
}

Posts that exceed Bluesky's 300-character limit are preserved as-is with a .tooLong validation status, rather than silently truncated. The user sees the full text with a warning badge and can choose to edit before posting.

Post History & Ratings

Generated posts are persisted via SwiftData with a rating system for training data curation:

@Model
final class SavedPost {
    var text: String           // The generated post text
    var topic: String?         // Optional topic used for generation
    var generatedAt: Date      // When the LLM produced it
    var savedAt: Date          // When the user saved it
    var ratingValue: Int       // 0=unrated, 1=good, 2=notGood
}

The rating system (thumbs up / thumbs down) serves a dual purpose:

Immediate feedback — users can quickly curate which posts represent their voice well
Future training data — good-rated posts can be fed back into LoRA training to improve the model over time

Training Data Pipeline

From Social Media Export to Training JSONL

The pipeline converts raw social media posts into ChatML-formatted JSONL for LoRA fine-tuning:

Bluesky JSON export  ──┐
                       ├──▶ [RawPost] ──▶ clean ──▶ filter URLs ──▶ split train/eval
Twitter JSON archive ──┘                                                    │
                                                                           ▼
                                                              ChatML JSONL output

Cleaning rules:

Minimum 10 characters
Strip reposts
Remove URL-heavy posts (>50% URL content)
Trim whitespace

ChatML format (matching mlx_lm.lora's expected input):

{"messages": [
  {"role": "user", "content": "Write a short social media post..."},
  {"role": "assistant", "content": "The actual post text"}
]}

Training Configuration

The LoRA training runs on a Mac using MLX, with these proven hyperparameters:

LoRAConfig(
    modelName: "Qwen/Qwen3-4B",
    rank: 8,                      // LoRA rank
    scale: 20.0,                  // LoRA alpha/scaling
    dropout: 0.0,
    iterations: 25000,            // Long training for voice capture
    learningRate: 1e-5,           // Conservative for fine-tuning
    batchSize: 1,
    gradAccumulationSteps: 8,     // Effective batch size of 8
    maxSeqLength: 256,
    evalRatio: 0.05               // 5% held out for evaluation
)

AT Protocol / Bluesky Integration

The app includes a full AT Protocol client for posting to Bluesky:

Pure Request Builders

All network request construction is done via pure functions that return URLRequest objects:

buildCreateSessionRequest(credentials:pdsURL:)  → URLRequest   // Login
buildCreatePostRequest(session:record:pdsURL:)   → URLRequest   // Create post
buildFetchPostsRequest(session:handle:...)        → URLRequest   // Fetch feed

Pure Response Parsers

Similarly, response parsing is pure Data → Result:

parseSessionResponse(data:)     → ATProtoSession   // DID, handle, JWTs
parseCreatePostResponse(data:)  → PostRef           // URI, CID
parseFetchPostsResponse(data:)  → PostsPage         // Posts + cursor

Protocol-Oriented Design

public protocol BlueskyClient: Sendable {
    func fetchPosts(handle: String, cursor: String?) async throws -> PostsPage
    func createPost(text: String) async throws -> PostRef
}

ATProtoBlueskyClient is the real implementation; MockBlueskyClient is used in tests. The real client delegates all request construction and response parsing to the pure functions above.

Scheduling System

The scheduling system controls when posts are generated and published, designed for integration with iOS App Intents and Shortcuts:

Constraints (All Composable Pure Functions)

// Time window: only post between 9 AM and 10 PM
isWithinPostingWindow(date:config:calendar:) → Bool

// Cooldown: at least 2 hours between posts
hasMinimumInterval(since:now:config:) → Bool

// Rate limit: max 3 posts per day
isUnderDailyLimit(postsToday:config:) → Bool

// Master decision: composes all constraints
shouldGenerateNow(date:lastPostDate:postsToday:config:calendar:) → Bool

// Next opportunity: when to try again
nextPostTime(after:config:calendar:) → Date?

Configuration

ScheduleConfig(
    postsPerDay: 3,                // Daily limit
    earliestHour: 9,               // Don't post before 9 AM
    latestHour: 22,                // Don't post after 10 PM
    minimumIntervalMinutes: 120,   // 2-hour cooldown
    jitterMinutes: 30              // Randomize timing
)

iOS App

Tab-Based Interface

The app has four tabs:

Chat — Free-form conversation with the model. Messages are formatted as ChatML with a system prompt. The /no_think suffix disables Qwen3's chain-of-thought for faster chat responses.
Post — One-tap post generation in the trained voice. Shows the generated text with validation status (character count, valid/too long/empty), a "Save to History" button, and a toolbar link to the post history view.
Import — Import training data (JSONL) or pre-trained LoRA adapters (GGUF) from the Files app.
Settings — LoRA adapter toggle, model info, preferences.

Model Download

On first launch, the app downloads the GGUF model (~5 GB) to the Documents directory using URLSessionDownloadTask. The download supports resumption if interrupted and persists across app updates (Documents directory is preserved).

Keyboard Handling

The chat interface includes three keyboard dismissal mechanisms:

Interactive scroll — dragging the message list dismisses the keyboard progressively
Tap to dismiss — tapping the message area dismisses the keyboard
Done button — a toolbar button above the keyboard for explicit dismissal

This ensures the tab bar remains accessible even with the keyboard open.

Development Setup

Prerequisites

Xcode 16+ with iOS 18 SDK
XcodeGen (brew install xcodegen)
SwiftLint (brew install swiftlint)
SwiftFormat (brew install swiftformat)
iPhone with arm64 (llama.swift xcframework has no simulator slice)
The llama.xcframework in Frameworks/ (not checked into git due to size)

Building

# Generate Xcode project from project.yml
xcodegen generate

# Run tests (pure logic, no device needed)
swift test

# Build for device
xcodebuild -project MyMeBot.xcodeproj -scheme MyMeBotApp -destination 'generic/platform=iOS' build

Code Quality

The project enforces strict code quality via pre-commit hooks:

SwiftLint — strict mode with 36 opt-in rules. Enforces function body length (40 lines warning, 80 error), cyclomatic complexity (10/20), no force unwrapping, identifier naming (2-60 chars), and more.
SwiftFormat — consistent formatting with 120-char line width, alphabetized imports, before-first wrapping for arguments/collections/parameters.
Pre-commit hooks — both tools run automatically on git commit, blocking commits that don't pass.

Deploying to Device

# Install to connected iPhone
xcrun devicectl device install app --device <DEVICE_UUID> path/to/MyMeBotApp.app

# Copy model file to app's Documents directory
xcrun devicectl device copy to \
  --device <DEVICE_UUID> \
  --domain-type appDataContainer \
  --domain-identifier com.mymebot.app \
  --source bobby-qwen3-8b-fused-q4km.gguf \
  --destination Documents/bobby-qwen3-8b-fused-q4km.gguf

# Launch with console output
xcrun devicectl device process launch --device <DEVICE_UUID> --console com.mymebot.app

Getting the Model

The fused model (bobby-qwen3-8b-fused-q4km.gguf) is generated from a fine-tuned MLX model using the three-step conversion pipeline described above. You'll need:

A fine-tuned MLX model (e.g., from mlx_lm.lora training)
llama.cpp tools: convert_hf_to_gguf.py and llama-quantize
MLX Python package for dequantization

For a generic (non-fine-tuned) model, you can download a pre-quantized GGUF from Hugging Face:

# Example: base Qwen3-8B (no fine-tuning)
wget https://huggingface.co/Qwen/Qwen3-8B-GGUF/resolve/main/Qwen3-8B-Q4_K_M.gguf

Decision Tracking

The project uses deciduous to track every architectural decision, implementation choice, and outcome in a queryable graph:

158 nodes, 169 edges
Types: goals, options, decisions, actions, outcomes, observations

The decision graph enforces a strict flow rule:

goal → options → decisions → actions → outcomes

Goals lead to options (not decisions). You explore alternatives first, then decide. This mirrors how good engineering works — evaluate before committing. The graph is viewable via deciduous tui or the web viewer at the GitHub Pages deployment.

Every git commit is linked to its corresponding action/outcome node in the graph via --commit HEAD, creating a bidirectional mapping between code changes and the reasoning behind them.

Test Coverage

183 tests across 51 suites, all passing in under 1 second:

Area	Suites	What's Tested
Models & Types	5	InferenceConfig defaults, type equality, Codable conformance
Data Pipeline	8	Bluesky/Twitter parsing, post cleaning, ChatML serialization, dataset splitting
Post Generation	7	Prompt formatting, validation, truncation, emoji/hashtag stripping, auto-post logic
Chat Formatting	4	ChatML messages, conversations, system prompts, /no_think handling
AT Protocol	5	Request builders, response parsers, session types, error handling
Scheduling	8	Posting windows, intervals, daily limits, intent responses, notifications
Model Download	4	File paths, progress formatting, file sizes, download state
Integration	6	End-to-end pipeline composition, mock engine tests

Testing Philosophy

The test suite focuses on composition over isolation. Individual function tests exist, but the valuable tests verify that functions compose correctly:

// This test verifies the FULL pipeline: strip thinking → clean → validate → build record → auto-post
@Test("Full pipeline with mock engine")
func fullPipelineWithMock() async throws {
    let engine = MockLLMEngine(response: "<think>hmm</think>Swift is awesome")
    let raw = try await engine.generate(prompt: "test", config: InferenceConfig())
    let result = processLLMOutput(raw: raw, settings: PostSettings(autoPostEnabled: true))
    #expect(result.post.text == "Swift is awesome")
    #expect(result.shouldPost == true)
    #expect(result.record.text == "Swift is awesome")
}

Project History

The project was built incrementally across six phases:

Phase	Name	Status	Tests
0	Quality Infrastructure	Complete	45
1	Model Setup + Inference	Complete	45
2	LoRA Training Pipeline	Complete	65
3	Device Integration	Complete	122
4	Scheduling	Complete	151
5	iOS App	In Progress	183

See THE_STORY_SO_FAR.md for the full narrative, including all the dead ends, bugs caught, and architectural decisions made along the way.

Tech Stack

Component	Technology
Language	Swift 6.0
Platforms	iOS 18+, macOS 15+
UI Framework	SwiftUI
Persistence	SwiftData
Inference	llama.cpp via llama.swift
GPU	Metal (Apple A19 Pro)
Model	Qwen3-8B Q4_K_M (fused LoRA)
Training	MLX LoRA (runs on Mac)
Social API	AT Protocol (Bluesky)
Project Gen	XcodeGen
Linting	SwiftLint (strict) + SwiftFormat
Decision Tracking	deciduous
Package Manager	Swift Package Manager

License

This project is not currently published under an open-source license. The codebase is public for educational and reference purposes.

Name		Name	Last commit message	Last commit date
Latest commit History 33 Commits
.claude		.claude
.github/workflows		.github/workflows
App		App
MyMeBot.xcodeproj		MyMeBot.xcodeproj
Sources		Sources
Tests/MyMeBotCoreTests		Tests/MyMeBotCoreTests
docs		docs
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
.swiftformat		.swiftformat
.swiftlint.yml		.swiftlint.yml
CLAUDE.md		CLAUDE.md
PLAN.md		PLAN.md
Package.swift		Package.swift
README.md		README.md
THE_STORY_SO_FAR.md		THE_STORY_SO_FAR.md
project.yml		project.yml

Folders and files

Latest commit

History

Repository files navigation

MyMeBot

What It Does

Architecture

Functional Core, Imperative Shell

Module Structure

The Model

Bobby-Qwen3-8B-Fused (Q4_K_M)

Why Fused Weights Instead of Runtime LoRA

MLX to GGUF Conversion Pipeline

On-Device Inference

Metal GPU Acceleration

Memory Management

Sampling Configuration

KV Cache Management

Post Generation Pipeline

Post Constraints

Post Validation

Post History & Ratings

Training Data Pipeline

From Social Media Export to Training JSONL

Training Configuration

AT Protocol / Bluesky Integration

Pure Request Builders

Pure Response Parsers

Protocol-Oriented Design

Scheduling System

Constraints (All Composable Pure Functions)

Configuration

iOS App

Tab-Based Interface

Model Download

Keyboard Handling

Development Setup

Prerequisites

Building

Code Quality

Deploying to Device

Getting the Model

Decision Tracking

Test Coverage

Testing Philosophy

Project History

Tech Stack

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages