An iOS app that runs a fine-tuned LLM entirely on-device to generate social media posts in your own voice and style. No cloud APIs, no data leaving your phone. Your AI avatar lives in your pocket.
MyMeBot takes your existing social media posts (from Bluesky or Twitter), fine-tunes a language model to write like you, then runs that model locally on your iPhone to generate new posts on demand. The entire pipeline — from training data preparation to on-device inference — is designed to run without any server infrastructure.
Current state: Functional iOS app running a fused Qwen3-8B model (4.7GB, Q4_K_M quantization) on iPhone 17 Pro with Metal GPU acceleration. Generates posts in the user's trained voice, with full post validation and history tracking via SwiftData.
The project follows a strict separation between pure logic and side effects:
MyMeBotCore (SPM Library) App Target (iOS)
┌──────────────────────────┐ ┌──────────────────────────┐
│ Pure functions only │ │ SwiftUI views │
│ No UIKit/SwiftUI │ │ URLSession networking │
│ No file I/O │ │ llama.cpp inference │
│ No network calls │ │ SwiftData persistence │
│ 183 tests, <1s runtime │ │ Metal GPU acceleration │
└──────────────────────────┘ └──────────────────────────┘
Why this matters: Every piece of business logic — prompt formatting, post validation, training data preparation, scheduling constraints, AT Protocol request building — is a pure function that takes inputs and returns outputs. No mocks needed for 90% of the test suite. Tests run in milliseconds, not minutes.
Sources/MyMeBotCore/ # Pure logic library (zero UIKit imports)
├── Models.swift # Value types: InferenceConfig, ProcessedPost, UserProfile, etc.
├── LLMEngine.swift # LLMEngine protocol + MockLLMEngine
├── PostGeneration.swift # Prompt formatting, validation, cleaning, AT Protocol records
├── ChatFormatting.swift # ChatML message formatting for Qwen3
├── DataPipeline.swift # Bluesky/Twitter parsing, ChatML export, training pairs
├── BlueskyClient.swift # BlueskyClient protocol + AT Protocol client
├── ATProtoRequests.swift # Pure request builders + response parsers
├── Scheduling.swift # Posting windows, rate limits, intent responses
└── ModelDownload.swift # Download state machine, model info, progress formatting
App/ # iOS app target (imperative shell)
├── MyMeBotApp.swift # @main entry point, SwiftData container
├── Models/
│ └── SavedPost.swift # SwiftData model for post history with ratings
├── Services/
│ ├── LlamaCppEngine.swift # Real llama.cpp inference (Metal GPU, 30/37 layers)
│ ├── ModelManager.swift # Inference coordination, serial dispatch queue
│ └── ModelDownloadService.swift # URLSessionDownloadTask for ~5GB model
├── ViewModels/
│ ├── ChatViewModel.swift # Conversation history, ChatML formatting
│ ├── PostGenerationViewModel.swift # Post generation with LoRA training prompt
│ └── ImportViewModel.swift # Training data import
└── Views/
├── ContentView.swift # Tab-based navigation (Chat, Post, Import, Settings)
├── ChatView.swift # Message bubbles, keyboard handling
├── GeneratePostView.swift # One-tap post generation with save-to-history
├── PostHistoryView.swift # Saved posts with good/not-good ratings
├── ModelDownloadView.swift # Download progress, retry logic
├── SettingsView.swift # LoRA adapter toggle, preferences
└── ImportDataView.swift # JSONL/adapter file import
Tests/MyMeBotCoreTests/ # 183 tests across 51 suites
├── ModelsTests.swift
├── DataPipelineTests.swift
├── TrainingPipelineTests.swift
├── PostGenerationTests.swift
├── ChatFormattingTests.swift
├── SchedulingTests.swift
├── IntentResponseTests.swift
├── ModelDownloadTests.swift
├── ATProtoParserTests.swift
├── ATProtoTests.swift
└── IntegrationTests.swift
The app runs a Qwen3-8B model with LoRA fine-tuning baked directly into the weights:
| Property | Value |
|---|---|
| Architecture | Qwen3 (8.2B parameters) |
| Quantization | Q4_K_M (4-bit, mixed precision) |
| File size | 4.7 GB |
| Context window | 512 tokens (reduced from 40,960 for memory) |
| GPU layers | 30 of 37 on Metal GPU |
| CPU layers | 7 via memory-mapped I/O |
| GPU memory | ~4,789 MiB |
| CPU memory | ~1,126 MiB (mmap, not resident) |
The LoRA adapters were trained on Qwen3-8B using MLX on a Mac (rank 8, scale 20.0, 25,000 iterations). Rather than loading adapters at runtime, we fused them directly into the base model weights. This means:
- No adapter file management — one GGUF file contains everything
- No runtime overhead — fused weights are as fast as the base model
- Simpler deployment — copy one file to the device
The fused model started as MLX 4-bit quantized weights, which llama.cpp can't read directly. The conversion required three steps:
MLX 4-bit (fused_model_v3)
│
▼ mlx_lm.convert --dequantize
bf16 safetensors (~15 GB)
│
▼ convert_hf_to_gguf.py --outtype q8_0
GGUF q8_0 (8.7 GB)
│
▼ llama-quantize --allow-requantize Q4_K_M
GGUF Q4_K_M (4.7 GB) ← final model
The --allow-requantize flag is critical — llama-quantize refuses to quantize from q8_0 by default since it's already quantized. Each intermediate file was deleted between steps due to disk space constraints.
The app uses llama.cpp via the llama.swift xcframework for inference. Key configuration:
// Model loading
modelParams.use_mmap = true // Memory-map weights (CPU layers don't consume RAM)
modelParams.n_gpu_layers = 30 // 30 of 37 layers on Metal GPU
// Context
ctxParams.n_ctx = 512 // Context window (tokens)
ctxParams.n_batch = 512 // Batch size for prompt processing
ctxParams.flash_attn_type = LLAMA_FLASH_ATTN_TYPE_DISABLED // StabilityThe iPhone 17 Pro has 8 GB of RAM. iOS typically allows 4-6 GB for foreground apps depending on system load. Our memory budget:
- GPU (Metal): ~4,789 MiB for model weights + 58 MiB KV cache = ~4,847 MiB
- CPU (mmap): ~1,126 MiB memory-mapped (not resident, paged on demand)
- App overhead: ~73 MiB for Swift runtime, SwiftUI, SwiftData
The 7 CPU layers use memory-mapped I/O, which means they're paged in from the GGUF file on demand and don't count against the app's resident memory. This is how we fit an 8B model on a device with an 8 GB memory budget.
For post generation (trained voice):
InferenceConfig(
temperature: 0.95, // High creativity for varied posts
topP: 0.9, // Nucleus sampling
maxTokens: 280, // Bluesky's ~300 char limit
repeatPenalty: 1.1 // Avoid repetitive phrasing
)For general chat:
InferenceConfig(
temperature: 0.7, // More focused responses
topP: 0.9,
maxTokens: 512,
repeatPenalty: 1.1
)Each generation starts by clearing the KV cache to prevent context leakage between independent requests:
guard let memory = llama_get_memory(context) else {
throw LlamaCppError.generationFailed("Failed to get memory handle")
}
llama_memory_clear(memory, true)The random sampling seed uses UInt32.random(in: 0...UInt32.max) instead of a timestamp-based seed to ensure different outputs on consecutive generations.
Every generated post passes through a multi-stage processing pipeline, implemented entirely as composable pure functions:
User taps "Generate"
│
▼
Format ChatML prompt (training prompt + /no_think + topic)
│
▼
LLM inference (Qwen3-8B on Metal GPU)
│
▼
Strip <think>...</think> tags (Qwen3 chain-of-thought)
│
▼
Strip emoji (defense-in-depth, also instructed in prompt)
│
▼
Strip hashtags (defense-in-depth, also instructed in prompt)
│
▼
Trim whitespace
│
▼
Validate (empty? too long? valid?)
│
▼
Build AT Protocol record (ISO 8601 timestamp)
│
▼
Auto-post decision (settings + validation)
│
▼
Display to user with save-to-history option
The prompt explicitly instructs: "Never use emoji. Never use hashtags." As defense-in-depth, the pipeline also strips emoji and hashtags from the output programmatically. The emoji stripper uses Unicode scalar properties (isEmoji with ASCII exemption for basic punctuation). The hashtag stripper uses NSRegularExpression to remove #word patterns.
public enum PostValidation: Equatable, Sendable {
case valid // Under 300 chars, non-empty
case tooLong(characterCount: Int, limit: Int) // Over limit (preserved, not truncated)
case empty // Whitespace-only or blank
case containsProhibitedContent(reason: String) // Future: content filtering
}Posts that exceed Bluesky's 300-character limit are preserved as-is with a .tooLong validation status, rather than silently truncated. The user sees the full text with a warning badge and can choose to edit before posting.
Generated posts are persisted via SwiftData with a rating system for training data curation:
@Model
final class SavedPost {
var text: String // The generated post text
var topic: String? // Optional topic used for generation
var generatedAt: Date // When the LLM produced it
var savedAt: Date // When the user saved it
var ratingValue: Int // 0=unrated, 1=good, 2=notGood
}The rating system (thumbs up / thumbs down) serves a dual purpose:
- Immediate feedback — users can quickly curate which posts represent their voice well
- Future training data — good-rated posts can be fed back into LoRA training to improve the model over time
The pipeline converts raw social media posts into ChatML-formatted JSONL for LoRA fine-tuning:
Bluesky JSON export ──┐
├──▶ [RawPost] ──▶ clean ──▶ filter URLs ──▶ split train/eval
Twitter JSON archive ──┘ │
▼
ChatML JSONL output
Cleaning rules:
- Minimum 10 characters
- Strip reposts
- Remove URL-heavy posts (>50% URL content)
- Trim whitespace
ChatML format (matching mlx_lm.lora's expected input):
{"messages": [
{"role": "user", "content": "Write a short social media post..."},
{"role": "assistant", "content": "The actual post text"}
]}The LoRA training runs on a Mac using MLX, with these proven hyperparameters:
LoRAConfig(
modelName: "Qwen/Qwen3-4B",
rank: 8, // LoRA rank
scale: 20.0, // LoRA alpha/scaling
dropout: 0.0,
iterations: 25000, // Long training for voice capture
learningRate: 1e-5, // Conservative for fine-tuning
batchSize: 1,
gradAccumulationSteps: 8, // Effective batch size of 8
maxSeqLength: 256,
evalRatio: 0.05 // 5% held out for evaluation
)The app includes a full AT Protocol client for posting to Bluesky:
All network request construction is done via pure functions that return URLRequest objects:
buildCreateSessionRequest(credentials:pdsURL:) → URLRequest // Login
buildCreatePostRequest(session:record:pdsURL:) → URLRequest // Create post
buildFetchPostsRequest(session:handle:...) → URLRequest // Fetch feedSimilarly, response parsing is pure Data → Result:
parseSessionResponse(data:) → ATProtoSession // DID, handle, JWTs
parseCreatePostResponse(data:) → PostRef // URI, CID
parseFetchPostsResponse(data:) → PostsPage // Posts + cursorpublic protocol BlueskyClient: Sendable {
func fetchPosts(handle: String, cursor: String?) async throws -> PostsPage
func createPost(text: String) async throws -> PostRef
}ATProtoBlueskyClient is the real implementation; MockBlueskyClient is used in tests. The real client delegates all request construction and response parsing to the pure functions above.
The scheduling system controls when posts are generated and published, designed for integration with iOS App Intents and Shortcuts:
// Time window: only post between 9 AM and 10 PM
isWithinPostingWindow(date:config:calendar:) → Bool
// Cooldown: at least 2 hours between posts
hasMinimumInterval(since:now:config:) → Bool
// Rate limit: max 3 posts per day
isUnderDailyLimit(postsToday:config:) → Bool
// Master decision: composes all constraints
shouldGenerateNow(date:lastPostDate:postsToday:config:calendar:) → Bool
// Next opportunity: when to try again
nextPostTime(after:config:calendar:) → Date?ScheduleConfig(
postsPerDay: 3, // Daily limit
earliestHour: 9, // Don't post before 9 AM
latestHour: 22, // Don't post after 10 PM
minimumIntervalMinutes: 120, // 2-hour cooldown
jitterMinutes: 30 // Randomize timing
)The app has four tabs:
-
Chat — Free-form conversation with the model. Messages are formatted as ChatML with a system prompt. The
/no_thinksuffix disables Qwen3's chain-of-thought for faster chat responses. -
Post — One-tap post generation in the trained voice. Shows the generated text with validation status (character count, valid/too long/empty), a "Save to History" button, and a toolbar link to the post history view.
-
Import — Import training data (JSONL) or pre-trained LoRA adapters (GGUF) from the Files app.
-
Settings — LoRA adapter toggle, model info, preferences.
On first launch, the app downloads the GGUF model (~5 GB) to the Documents directory using URLSessionDownloadTask. The download supports resumption if interrupted and persists across app updates (Documents directory is preserved).
The chat interface includes three keyboard dismissal mechanisms:
- Interactive scroll — dragging the message list dismisses the keyboard progressively
- Tap to dismiss — tapping the message area dismisses the keyboard
- Done button — a toolbar button above the keyboard for explicit dismissal
This ensures the tab bar remains accessible even with the keyboard open.
- Xcode 16+ with iOS 18 SDK
- XcodeGen (
brew install xcodegen) - SwiftLint (
brew install swiftlint) - SwiftFormat (
brew install swiftformat) - iPhone with arm64 (llama.swift xcframework has no simulator slice)
- The
llama.xcframeworkinFrameworks/(not checked into git due to size)
# Generate Xcode project from project.yml
xcodegen generate
# Run tests (pure logic, no device needed)
swift test
# Build for device
xcodebuild -project MyMeBot.xcodeproj -scheme MyMeBotApp -destination 'generic/platform=iOS' buildThe project enforces strict code quality via pre-commit hooks:
- SwiftLint — strict mode with 36 opt-in rules. Enforces function body length (40 lines warning, 80 error), cyclomatic complexity (10/20), no force unwrapping, identifier naming (2-60 chars), and more.
- SwiftFormat — consistent formatting with 120-char line width, alphabetized imports, before-first wrapping for arguments/collections/parameters.
- Pre-commit hooks — both tools run automatically on
git commit, blocking commits that don't pass.
# Install to connected iPhone
xcrun devicectl device install app --device <DEVICE_UUID> path/to/MyMeBotApp.app
# Copy model file to app's Documents directory
xcrun devicectl device copy to \
--device <DEVICE_UUID> \
--domain-type appDataContainer \
--domain-identifier com.mymebot.app \
--source bobby-qwen3-8b-fused-q4km.gguf \
--destination Documents/bobby-qwen3-8b-fused-q4km.gguf
# Launch with console output
xcrun devicectl device process launch --device <DEVICE_UUID> --console com.mymebot.appThe fused model (bobby-qwen3-8b-fused-q4km.gguf) is generated from a fine-tuned MLX model using the three-step conversion pipeline described above. You'll need:
- A fine-tuned MLX model (e.g., from
mlx_lm.loratraining) - llama.cpp tools:
convert_hf_to_gguf.pyandllama-quantize - MLX Python package for dequantization
For a generic (non-fine-tuned) model, you can download a pre-quantized GGUF from Hugging Face:
# Example: base Qwen3-8B (no fine-tuning)
wget https://huggingface.co/Qwen/Qwen3-8B-GGUF/resolve/main/Qwen3-8B-Q4_K_M.ggufThe project uses deciduous to track every architectural decision, implementation choice, and outcome in a queryable graph:
158 nodes, 169 edges
Types: goals, options, decisions, actions, outcomes, observations
The decision graph enforces a strict flow rule:
goal → options → decisions → actions → outcomes
Goals lead to options (not decisions). You explore alternatives first, then decide. This mirrors how good engineering works — evaluate before committing. The graph is viewable via deciduous tui or the web viewer at the GitHub Pages deployment.
Every git commit is linked to its corresponding action/outcome node in the graph via --commit HEAD, creating a bidirectional mapping between code changes and the reasoning behind them.
183 tests across 51 suites, all passing in under 1 second:
| Area | Suites | What's Tested |
|---|---|---|
| Models & Types | 5 | InferenceConfig defaults, type equality, Codable conformance |
| Data Pipeline | 8 | Bluesky/Twitter parsing, post cleaning, ChatML serialization, dataset splitting |
| Post Generation | 7 | Prompt formatting, validation, truncation, emoji/hashtag stripping, auto-post logic |
| Chat Formatting | 4 | ChatML messages, conversations, system prompts, /no_think handling |
| AT Protocol | 5 | Request builders, response parsers, session types, error handling |
| Scheduling | 8 | Posting windows, intervals, daily limits, intent responses, notifications |
| Model Download | 4 | File paths, progress formatting, file sizes, download state |
| Integration | 6 | End-to-end pipeline composition, mock engine tests |
The test suite focuses on composition over isolation. Individual function tests exist, but the valuable tests verify that functions compose correctly:
// This test verifies the FULL pipeline: strip thinking → clean → validate → build record → auto-post
@Test("Full pipeline with mock engine")
func fullPipelineWithMock() async throws {
let engine = MockLLMEngine(response: "<think>hmm</think>Swift is awesome")
let raw = try await engine.generate(prompt: "test", config: InferenceConfig())
let result = processLLMOutput(raw: raw, settings: PostSettings(autoPostEnabled: true))
#expect(result.post.text == "Swift is awesome")
#expect(result.shouldPost == true)
#expect(result.record.text == "Swift is awesome")
}The project was built incrementally across six phases:
| Phase | Name | Status | Tests |
|---|---|---|---|
| 0 | Quality Infrastructure | Complete | 45 |
| 1 | Model Setup + Inference | Complete | 45 |
| 2 | LoRA Training Pipeline | Complete | 65 |
| 3 | Device Integration | Complete | 122 |
| 4 | Scheduling | Complete | 151 |
| 5 | iOS App | In Progress | 183 |
See THE_STORY_SO_FAR.md for the full narrative, including all the dead ends, bugs caught, and architectural decisions made along the way.
| Component | Technology |
|---|---|
| Language | Swift 6.0 |
| Platforms | iOS 18+, macOS 15+ |
| UI Framework | SwiftUI |
| Persistence | SwiftData |
| Inference | llama.cpp via llama.swift |
| GPU | Metal (Apple A19 Pro) |
| Model | Qwen3-8B Q4_K_M (fused LoRA) |
| Training | MLX LoRA (runs on Mac) |
| Social API | AT Protocol (Bluesky) |
| Project Gen | XcodeGen |
| Linting | SwiftLint (strict) + SwiftFormat |
| Decision Tracking | deciduous |
| Package Manager | Swift Package Manager |
This project is not currently published under an open-source license. The codebase is public for educational and reference purposes.