A production-grade AI chat platform with multi-model orchestration, artifacts, memory, and intelligent file processing.
Screenshots • Features • Tech Stack • Architecture • AI Models • Security
Full-featured chat with streaming responses, markdown rendering, LaTeX math, code highlighting, and file attachments.
Claude-style artifact system with file tree navigation, syntax highlighting, live HTML preview in a sandboxed iframe, version history, and one-click ZIP download.
The artifact panel supports full version history (v1, v2, v3...) with file-level navigation, as shown in the Code tab above. Switch between versions, download as ZIP, or share via token-based public links.
Sign in with Google, Microsoft, or Discord — or register with email. Guest mode available for anonymous sessions.
Choose from Llama 3.3 70B, DeepSeek R1, Qwen 3, and more — or let Smart Router auto-select the best model for your task.
Upload PDFs, images, Word docs, Excel spreadsheets, and PowerPoint files. Scanned documents are auto-OCR'd with Tesseract + EasyOCR fallback.
Fully responsive with swipe gestures, collapsible sidebar, and touch-optimized controls.
Automatically learns and remembers user preferences, project context, and profile details across conversations — with confidence thresholds and privacy safeguards.
- Streaming AI responses with real-time token rendering and abort support
- Markdown rendering via markdown-it with task lists, smart typography, and auto-linking
- LaTeX math with KaTeX (inline
$...$and display$$...$$) - Code highlighting with Prism.js and 20+ language support
- Mermaid diagrams rendered inline from chat responses
- File & image attachments with drag-and-drop, preview thumbnails, and inline display
- Message actions — edit, delete, copy, and quote any message
- Keyboard shortcuts —
Ctrl+Enterto send,Ctrl+Kto search sessions
- Claude-style side panel for generated code, projects, and documents
- File tree view with collapsible directories and syntax-highlighted editors (CodeMirror)
- Live preview — HTML renders in a sandboxed iframe, Markdown renders with styling
- Version history — browse v1, v2, v3... with full file-level diffs
- Download as ZIP — export any artifact version as a complete project
- Shareable links — token-based public access for sharing artifacts externally
- Comments — add notes and annotations to artifacts
- Smart Router — automatically selects the optimal model based on:
- Task type (coding, reasoning, creative writing, vision)
- Conversation length (switches to long-context models as needed)
- Content analysis (routes to appropriate model on refusal detection)
- Manual override — pick any model from the dropdown
- Circuit breaker pattern — prevents cascading failures when an API provider is down
- Ranked fallback chains — Groq → Cerebras for uninterrupted service
- Automatic extraction of user preferences, profile facts, and project context
- Confidence thresholds (0.60–0.90) with promotion from PROVISIONAL → ACTIVE
- Privacy safeguards — never stores passwords, API keys, health data, or financial info
- Injection detection — blocks prompt-poisoning attempts in memory content
- Soft delete with 30-day retention and supersede tracking for corrections
- Rate limits — 3 memories/message, 10/day to prevent noise
- V3 adaptive layer — A/B testing behavioral signals without persistent storage
- PDF — text extraction with OCR fallback for scanned pages
- Images (JPG, PNG) — Tesseract OCR with EasyOCR fallback, pre-processed for accuracy (greyscale, upscaling, contrast enhancement)
- Word documents (.docx) — full text + table extraction
- Excel spreadsheets (.xlsx) — sheet-aware data parsing
- PowerPoint (.pptx) — slide text and notes extraction
- RTF documents — rich text parsing
- Multi-layer validation — MIME type → file extension → magic bytes → content parsing
- 16MB size limit with per-type restrictions
- Brave Search integration with TF-IDF relevance scoring
- Domain blocklist with regex filtering
- Cryptocurrency prices — CoinGecko + CoinPaprika + Binance fallback chain (30+ coins)
- Stock news — Finnhub integration for market data
- Caching — 1-hour TTL with exponential backoff on rate limits
- Prometheus metrics for search observability
- OAuth 2.0 — Google, Microsoft, Discord with PKCE
- Email/password registration with Firebase Auth
- Guest mode — anonymous chat without account creation
- Session management — Firestore-backed with 7-day TTL
- Encrypted API key storage — users can bring their own keys
- Password reset via email
- Dark/light theme — system preference detection, manual toggle, 300ms smooth transitions, 40+ CSS design tokens
- Responsive design — optimized breakpoints for phone (≤512px), tablet, and desktop
- Mobile gestures — swipe-to-delete in chat history
- 32 modular CSS stylesheets with a design-tokens system
- Toast notifications — contextual success/error/warning messages
- Sidebar with searchable chat history, pinning, and archiving
- Voice mode — speech synthesis and dictation support
- FOUC prevention — theme pre-loaded from localStorage before DOM renders
| Technology | Purpose |
|---|---|
| Python 3.11 | Runtime |
| Flask 3.1 | Web framework with blueprint-based modular architecture |
| Gunicorn | Production WSGI server (2 workers × 4 threads) |
| Firebase Firestore | NoSQL database (multi-tenant, serverless) |
| Firebase Auth | Identity management + OAuth provider integration |
| Firebase Cloud Storage | File uploads and generated image storage |
| sentence-transformers | Embeddings (Jina v3, MiniLM) for hybrid retrieval |
| scikit-learn | TF-IDF vectorization for search relevance scoring |
| NLTK | Language-aware text chunking and tokenization |
| PyTorch (CPU) | Inference runtime for embedding models |
| Tesseract + EasyOCR | Dual OCR engines with automatic fallback |
| PyPDF2 / PIL | PDF and image processing pipeline |
| python-docx / openpyxl | Office document parsing |
| Bleach / DOMPurify | HTML sanitization (server + client) |
| Flask-Limiter | Rate limiting per user/endpoint |
| Prometheus | Metrics collection and monitoring |
| Technology | Purpose |
|---|---|
| Vanilla JavaScript | 2,200+ lines — no framework overhead |
| Jinja2 | Server-side templating |
| markdown-it | Markdown rendering with task lists plugin |
| KaTeX | LaTeX math rendering |
| Prism.js | Syntax highlighting (20+ languages) |
| CodeMirror 5 | In-browser code editor for artifacts |
| Mermaid | Diagram rendering from markdown |
| Font Awesome 6 | Icon library |
| Manrope + Roboto Mono | Typography (UI + code) |
| CSS Custom Properties | Design token system with theme switching |
| Technology | Purpose |
|---|---|
| Docker | Multi-stage containerization with pre-warmed models |
| Google Cloud Run | Serverless deployment with 300s timeout |
| Cloud Build | CI/CD pipeline |
| pytest + pytest-cov | Testing with coverage reporting |
| CSRFProtect | Cross-site request forgery prevention |
| Rotating File Logs | 10MB × 10 backup concurrent logging |
| Provider | Models | Use Case |
|---|---|---|
| Groq | Llama 3.3 70B, Qwen 3 235B, Llama Guard 3 | Primary inference, content safety |
| DeepSeek | DeepSeek R1 | Deep reasoning tasks |
| OpenRouter | Vision models, image generation | Multi-modal capabilities |
| Moonshot/Kimi | Long-context reasoning | 128K+ token conversations |
| Cerebras | Fallback models | High-availability fallback |
┌─────────────────────────────────────────────────────────────────┐
│ CLIENT (Browser) │
│ ┌──────────┐ ┌──────────┐ ┌──────────┐ ┌───────────────┐ │
│ │ Chat UI │ │ Artifacts│ │ Sidebar │ │ Theme Engine │ │
│ │ markdown │ │ CodeMirr │ │ Search │ │ 40+ tokens │ │
│ │ KaTeX │ │ Preview │ │ History │ │ localStorage │ │
│ │ Prism.js │ │ File Tree│ │ Pinning │ │ system-pref │ │
│ └────┬─────┘ └────┬─────┘ └────┬─────┘ └───────────────┘ │
│ │ │ │ │
│ └──────────────┼──────────────┘ │
│ │ SSE Streaming + REST API │
└──────────────────────┼───────────────────────────────────────────┘
│
┌──────────────────────┼───────────────────────────────────────────┐
│ FLASK APPLICATION │
│ │ │
│ ┌───────────────────┴────────────────────┐ │
│ │ Blueprint Router │ │
│ │ /chat /artifacts /upload /auth │ │
│ │ /memory /session /account /search │ │
│ └───┬──────┬──────┬──────┬──────┬────────┘ │
│ │ │ │ │ │ │
│ ┌───┴──┐ ┌─┴───┐ ┌┴────┐ ┌┴───┐ ┌┴──────┐ │
│ │Smart │ │Arti-│ │File │ │Auth│ │Memory │ │
│ │Router│ │facts│ │Proc.│ │ │ │ V2+V3 │ │
│ │ │ │Parse│ │OCR │ │OAuth│ │Extract│ │
│ │Circuit│ │Store│ │Valid.│ │CSRF│ │Guard │ │
│ │Breaker│ │Vers.│ │Magic│ │ │ │Promote│ │
│ └───┬──┘ └──┬──┘ └──┬──┘ └─┬──┘ └───┬───┘ │
│ │ │ │ │ │ │
│ ┌───┴───────┴───────┴──────┴────────┴───┐ │
│ │ Service Layer │ │
│ │ Embeddings (Jina v3 + MiniLM) │ │
│ │ TF-IDF Scoring │ TTL Caching │ │
│ │ Search Client │ Crypto API │ │
│ └───────────────────┬───────────────────┘ │
│ │ │
└──────────────────────┼───────────────────────────────────────────┘
│
┌──────────────┼──────────────┐
│ │ │
┌────┴────┐ ┌────┴────┐ ┌────┴────┐
│ Firebase │ │ AI │ │ Search │
│Firestore │ │ Models │ │ APIs │
│ Auth │ │ Groq │ │ Brave │
│ Storage │ │DeepSeek │ │CoinGecko│
│ │ │OpenRouter│ │Finnhub │
└─────────┘ └─────────┘ └─────────┘
Blueprint Modularization — 10 Flask blueprints isolate concerns (chat, artifacts, memory, auth, upload, session, account, weather, email, media), each with independent routes and logic.
Smart Model Router — Analyzes conversation context, detects task type (coding, reasoning, vision, creative), and dynamically selects the optimal model. Includes circuit breaker to prevent cascading failures.
Hybrid Retrieval — Combines dense vector search (sentence-transformer embeddings) with sparse keyword matching (TF-IDF) for memory queries and search ranking.
Multi-Layer File Validation — Every upload passes through MIME type check → extension allowlist → magic byte verification → content-specific parsing, preventing upload-based attacks.
Graceful Degradation — Crypto prices cascade through CoinGecko → CoinPaprika → Binance. OCR falls back from Tesseract → EasyOCR. Models fall back from Groq → Cerebras. The app never hard-fails.
Memory Orchestration — V2 handles production extraction with confidence scoring, quarantine, and promotion rules. V3 is an adaptive signals-only layer for A/B testing behavioral changes without persisting data.
User types message
│
▼
┌─────────┐ ┌──────────┐ ┌──────────────┐
│ Validate │───▶│ Smart │───▶│ Build Prompt │
│ CSRF │ │ Router │ │ + Memory │
│ Rate │ │ Select │ │ + Files │
│ Limit │ │ Model │ │ + Search │
└─────────┘ └──────────┘ └──────┬───────┘
│
▼
┌──────────────┐
│ LLM API │
│ (Streaming) │
└──────┬───────┘
│
┌─────────────┼──────────────┐
▼ ▼ ▼
┌───────────┐ ┌───────────┐ ┌────────────┐
│ Render │ │ Extract │ │ Parse │
│ Markdown │ │ Memory │ │ Artifacts │
│ + Math │ │ Items │ │ + Store │
└───────────┘ └───────────┘ └────────────┘
# Multi-stage build with pre-warmed embedding models
Python 3.11 Slim → System deps (Tesseract, fonts)
→ pip install (55+ packages)
→ Pre-download Jina v3 + MiniLM embeddings
→ Gunicorn (2 workers × 4 threads, port 5001)XelaSphere orchestrates 10+ AI models across multiple providers, automatically routing each request to the optimal model:
| Model | Provider | Context | Best For |
|---|---|---|---|
| Llama 3.3 70B | Groq | 128K | General chat, coding |
| Qwen 3 235B A22B | Groq | 128K | Complex reasoning |
| DeepSeek R1 | DeepSeek | 64K | Deep analysis, math |
| Llama Guard 3 8B | Groq | 8K | Content safety moderation |
| Kimi (Moonshot) | Moonshot | 262K | Ultra-long conversations |
| Vision Models | OpenRouter | Varies | Image understanding |
| Image Generation | OpenRouter | — | AI art generation |
| Cerebras Models | Cerebras | 8K | High-speed fallback |
The Smart Router uses task detection heuristics to auto-select models:
- Coding tasks → Llama 3.3 70B (fast, accurate)
- Deep reasoning → DeepSeek R1 (chain-of-thought)
- Long conversations → Kimi 262K context
- Image input → Vision models via OpenRouter
- Content safety → Llama Guard pre-screening
XelaSphere implements defense-in-depth security:
| Layer | Implementation |
|---|---|
| Authentication | Firebase Auth + OAuth 2.0 (PKCE) with session TTL |
| CSRF | Flask-CSRFProtect with token validation on all state-changing requests |
| XSS Prevention | Server-side Bleach sanitization + client-side DOMPurify |
| CSP Headers | Strict Content-Security-Policy with no unsafe-eval (except marked CDNs) |
| HTTP Headers | X-Frame-Options DENY, X-Content-Type-Options nosniff, Referrer-Policy |
| Rate Limiting | Per-user, per-endpoint limits (Flask-Limiter) |
| File Validation | MIME + extension + magic byte triple-check |
| Memory Safety | Injection pattern detection, no sensitive data storage, confidence gates |
| API Keys | Encrypted at rest in Firestore, never logged or exposed |
| Cookies | SameSite=Lax, Secure flag in production, HttpOnly |
xelasphere/
├── app/
│ ├── views.py # Core chat logic (6,500 lines)
│ ├── auth.py # OAuth + Firebase Auth (47KB)
│ ├── artifacts.py # Artifact parsing & versioning (68KB)
│ ├── config.py # Flask + Firebase initialization
│ ├── smart_router.py # Intelligent model selection
│ ├── grog_api.py # LLM API with circuit breaker
│ ├── crypto_api.py # Multi-provider crypto prices (84KB)
│ ├── search_client.py # Brave Search + TF-IDF scoring
│ ├── memory_v2/ # Production memory system
│ │ ├── orchestrator.py # Memory extraction pipeline
│ │ ├── extractor.py # Entity & preference extraction
│ │ ├── safeguards.py # Privacy & injection guards
│ │ └── storage.py # Firestore persistence
│ ├── memory_v3/ # Adaptive behavior layer
│ │ ├── signals.py # A/B testing signals
│ │ └── rollout.py # Feature rollout management
│ ├── blueprints/ # 10 modular route blueprints
│ │ ├── chat_routes.py
│ │ ├── artifacts_routes.py
│ │ ├── memory_routes.py
│ │ ├── upload_routes.py
│ │ ├── session_routes.py
│ │ ├── account_routes.py
│ │ └── ...
│ ├── services/ # External service integrations
│ ├── static/
│ │ ├── css/ # 32 modular stylesheets
│ │ ├── js/ # 16+ JavaScript modules
│ │ └── images/ # Logos, icons, OAuth provider SVGs
│ └── templates/ # 14 Jinja2 HTML templates
├── tests/ # pytest suite with coverage
├── Dockerfile # Multi-stage production build
├── cloudbuild.yaml # Google Cloud Build CI/CD
├── requirements-cpu.txt # 55+ Python dependencies
└── firestore.rules # Database security rules
- Embedding models pre-loaded in Docker image — zero cold-start latency for semantic search
- Gunicorn with 2 workers × 4 threads = 8 concurrent requests per instance
- TTL caching on search results, embeddings, and model responses
- Lazy loading — heavy ML models initialize only on first use
- Image preprocessing — greyscale conversion, upscaling, and contrast enhancement for OCR accuracy
- FOUC prevention — theme loaded from localStorage before first paint
- ThreadPoolExecutor for concurrent file processing
- SSE streaming — tokens render as they arrive, no waiting for full response
XelaSphere is containerized and deployed on Google Cloud Run with:
- Docker multi-stage build with pre-warmed embedding models
- Cloud Build for automated CI/CD on push
- Firestore for zero-ops, auto-scaling database
- Firebase Cloud Storage for file uploads
- 300s request timeout accommodating model cold starts
- Prometheus metrics endpoint for monitoring
Built by xelauvas.dev
This repository contains only documentation. Source code is proprietary.









