| license | gemma | |||||||
|---|---|---|---|---|---|---|---|---|
| language |
|
|||||||
| tags |
|
|||||||
| pipeline_tag | image-classification | |||||||
| library_name | gguf | |||||||
| extra |
|
On-device AI that detects Indonesian online gambling (judol) across websites and apps — fully private, fully offline.
Part of the Gemma 4 Good Hackathon.
TL;DR: We benchmarked the base model, fine-tuned with 7,336 samples, and ran the same benchmark again. The base model won — 90.87% accuracy, 1.41s/image, zero training cost. The fine-tune was unnecessary.
This model must be served with --reasoning off via llama.cpp directly.
Without it, the model generates ~300 tokens of internal thinking before answering, making it 6× slower (8.68s/img vs 1.41s/img).
❌ LM Studio's
reasoning: falseAPI parameter does NOT work — it only moves the thinking tokens to areasoning_contentfield. The thinking still happens. You must use llama.cpp.
./llama-server \
-m gemma-4-E2B-it-Q4_K_M.gguf \
--mmproj mmproj-gemma-4-E2B-it-BF16.gguf \
--host 127.0.0.1 --port 1234 \
-ngl 99 --reasoning off -c 8192JudolGuard is a gambling content detector for Indonesian online gambling ("judol") — slot machine ads, betting sites, gambling links in chat apps. It runs fully on-device via llama.cpp with Gemma 4 E2B.
The unexpected finding: After fine-tuning with 7,336 multimodal samples, the base model still won on every metric. This repo documents the full journey — benchmark, fine-tune, result.
This repo contains:
- Model — Base Gemma 4 E2B Q4_K_M GGUF, recommended for production
- Benchmark — Full 1,468-image evaluation across all variants
- Fine-tuning pipeline — Unsloth LoRA scripts (archived — fine-tune didn't beat base)
- Chrome extension — Plasmo-based browser extension connecting to local llama.cpp
See BENCHMARK.md for full methodology, per-variant confusion matrices, and failure analysis.
| Model | Backend | Acc | Recall | Precision | F1 | Speed | VRAM |
|---|---|---|---|---|---|---|---|
🏆 Base + --reasoning off |
llama.cpp | 90.87% | 82.43% | 99.18% | 90.03% | 1.41s/img 🚀 | 4.4 GB |
| LoRA FT Q5_K_M | LM Studio | 89.22% | 78.55% | 99.83% | 87.92% | 3.09s/img | 3.4 GB |
| LoRA FT Q4_K_M | LM Studio | 79.63% | 59.40% | 99.77% | 74.47% | 2.76s/img | 3.2 GB |
| Base + reasoning ON* | LM Studio | ~96.11% | ~94.23% | ~99.28% | ~96.66% | 8.68s/img | 4.4 GB |
*Base + reasoning ON evaluated on 200-sample subset; full 1,468-image run had 259 timeout errors.
Confusion Matrix — Base + reasoning OFF (winner):
Pred Gambling Pred Safe
Actual Gamb 605 129
Actual Safe 5 729
- 5 false positives in 734 safe images — 99.3% safe precision
- 129 false negatives — primary area for improvement
- Zero failures — no crashes, timeouts, or incomplete inferences in 1,468 images
- 🏆 Base model wins: +1.65% accuracy, +3.88% recall, 2.2× faster than our own fine-tune
- ⚡ 6× speedup:
--reasoning offvs reasoning ON (1.41s vs 8.68s), ~5% accuracy tradeoff - 🎯 Precision: 99.18% — almost never blocks legitimate content
- ❌ Q8_0 not recommended: 5.53 GB + 942 MB mmproj + KV cache causes OOM on 8GB VRAM
We first evaluated Gemma 4 E2B with zero training on our 1,468-image benchmark.
Result: 90.87% accuracy, 82.43% recall, 1.41s/image.
We assumed fine-tuning on domain-specific data would push this higher.
We curated a dataset of 7,336 images (50:50 gambling/safe) and fine-tuned using Unsloth LoRA on a free Google Colab T4 GPU — 200 steps, 50 minutes. Then ran the same benchmark.
| Source | Gambling | Safe |
|---|---|---|
| Website screenshots | 1,834 | 1,834 |
| Slot machine / casino UIs | 834 | 834 |
| Social media screenshots | 500 | 500 |
| Chat app gambling links | 500 | 500 |
Fine-tune accuracy: 89.22% (Q5_K_M). Base model: 90.87%. The fine-tune was slower, less accurate, and harder to deploy.
Why did the fine-tune fail?
- Label noise — some "gambling" images were educational gambling-awareness content (gambar edukasi)
- Catastrophic interference — 200 steps of LoRA on a small dataset damaged base representations without replacing them
- The base model already knew — Gemma 4 E2B's pretraining data already contains strong latent representations for gambling visuals
| Variant | File Size | Speed | Status |
|---|---|---|---|
| 🏆 Base Q4_K_M (recommended) | 3.41 GB | 1.41s/img | Use this |
| Q5_K_M (LoRA FT) | 3.62 GB | 3.09s/img | Archived — reference only |
| Q8_0 (LoRA FT) | 4.95 GB | 3.6s/img | ❌ OOM on 8GB VRAM |
| F16 (source) | 7.90 GB | — | Training artifact only |
All variants require the mmproj (vision encoder): mmproj-gemma-4-E2B-it-BF16.gguf (~942 MB, cannot be quantized).
# 1. Download the model
huggingface-cli download fadiil/judol-guard-gemma4-e2b-gguf \
gemma-4-E2B-it-Q4_K_M.gguf \
mmproj-gemma-4-E2B-it-BF16.gguf
# 2. Build llama.cpp with Vulkan GPU support
git clone https://github.com/ggerganov/llama.cpp
cd llama.cpp
cmake -B build -DGGML_VULKAN=ON
cmake --build build -j --target llama-server
# 3. Run the server (CRITICAL: --reasoning off)
./build/bin/llama-server \
-m gemma-4-E2B-it-Q4_K_M.gguf \
--mmproj mmproj-gemma-4-E2B-it-BF16.gguf \
--host 127.0.0.1 --port 1234 \
-ngl 99 --reasoning off -c 8192
# 4. (Optional) Expose via Cloudflare Tunnel
cloudflared tunnel --url http://localhost:1234
# 5. Sideload the Chrome extension from the repo├── finetune/ # Fine-tuning pipeline (Python) — archived
│ ├── COLAB_NOTEBOOK.md # Step-by-step Colab notebook
│ ├── train.py # Unsloth LoRA training script
│ ├── eval.py # Evaluation on held-out test set
│ └── dataset/ # Training dataset (gitignored)
│ ├── images/gambling/ # 3,668 gambling screenshots
│ ├── images/safe/ # 3,668 safe screenshots
│ ├── train.jsonl # 5,868 multimodal samples
│ └── test.jsonl # 1,468 held-out test samples
│
├── extension/ # Chrome browser extension (Plasmo)
│ ├── popup.tsx # Extension popup UI
│ └── README.md # Extension-specific docs
│
├── judol_blocker/ # Flutter Android app (scaffolded, in development)
│ ├── pubspec.yaml # Dependencies (flutter_gemma, aho_corasick, etc.)
│ └── .fvmrc # Flutter version lock
│
├── BENCHMARK.md # Full 1,468-image evaluation results
├── WRITEUP.md # Hackathon project writeup
└── glossary.md # Plain-language glossary for all terms
The extension/ directory contains a Chrome extension (Plasmo, Manifest V3) that:
- Extracts URL, visible text, title, and the largest image from every page
- Sends to the local llama.cpp API
- Shows a full-screen Indonesian-language warning if gambling is detected
- Falls back to keyword heuristics if the local AI is offline
See extension/README.md for installation.
The judol_blocker/ Flutter app is scaffolded with the architecture for system-wide protection:
- AccessibilityService — reads text across all apps
- flutter_gemma — on-device Gemma 4 inference
- 4-layer pipeline — keyword match → image trigger → text AI → vision AI
⚠️ Not yet complete — scaffolded and waiting for further development.
| Tool | Role |
|---|---|
| llama.cpp | Production inference — only backend with --reasoning off support |
| Unsloth | 2× faster LoRA fine-tuning on free T4 GPU |
| LM Studio | Early benchmarking — reasoning: false limitation discovered here |
| Plasmo | Chrome extension framework (Manifest V3) |
| HuggingFace | Model hosting and distribution |
| Cloudflare Tunnel | Exposes local inference server for extension |
- License: Gemma Terms of Use
- Built for the Gemma 4 Good Hackathon by @Fadil3
- Fine-tuning powered by Unsloth
- Inference powered by llama.cpp
"We spent weeks fine-tuning a model to beat the base model. The base model won. That's not a failure — that's a result."