Skip to content

Latest commit

 

History

History
132 lines (89 loc) · 3.74 KB

File metadata and controls

132 lines (89 loc) · 3.74 KB

Getting Started with Inference-in-a-Box

📋 Navigation: 🏠 Main README🎯 Goals & Vision📖 Usage Guide🏗️ Architecture🤖 AI Assistant

This guide provides detailed step-by-step instructions to get the platform up and running.

📖 Quick Reference: For a condensed quick start, see the Quick Start section in README.md 🎯 Why This Project: To understand the goals and vision behind this platform, start with GOALS.md

Prerequisites

📋 System Requirements: For detailed system requirements and tool versions, see Prerequisites in README.md

Ensure you have the following tools installed:

  • Docker 20.10+ with Kubernetes enabled
  • kubectl 1.24+
  • Kind 0.20+
  • Helm 3.12+
  • curl and jq (for API testing)

Step-by-Step Setup

1. Clone and Bootstrap

# Clone the repository
git clone https://github.com/smarunich/inference-in-a-box.git
cd inference-in-a-box

# One-command bootstrap (takes 10-15 minutes)
./scripts/bootstrap.sh

🔧 What Bootstrap Does: For detailed information about what the bootstrap script installs, see Technology Stack

2. Verify Installation

# Check cluster is ready
kubectl get nodes

# Verify core components
kubectl get pods -A | grep -E "(istio|envoy|kserve|knative)"

# Check sample models are deployed
kubectl get inferenceservice -A

3. Get Authentication Tokens

# Get JWT tokens for different tenants
./scripts/get-jwt-tokens.sh

# This creates tokens for tenant-a, tenant-b, and tenant-c
export TENANT_A_TOKEN="<token-from-script>"

4. Access Services

🌐 Service Access: For complete service access information and port forwarding commands, see Usage Guide

# Access management UI
kubectl port-forward svc/management-service 8085:80
# Open browser: http://localhost:8085

5. Run Interactive Demo

🎭 Complete Demo Guide: For comprehensive demo scenarios and explanations, see demo.md

# Interactive demo with multiple scenarios
./scripts/demo.sh

Making Your First Inference Request

📝 Complete API Guide: For detailed API usage and examples, see Usage Guide

Test the sklearn-iris model:

# Get your JWT token first
export JWT_TOKEN=$(./scripts/get-jwt-tokens.sh | grep "tenant-a" | cut -d' ' -f2)

# Make inference request
curl -H "Authorization: Bearer $JWT_TOKEN" \
     -H "x-ai-eg-model: sklearn-iris" \
     http://localhost:8080/v1/models/sklearn-iris:predict \
     -d '{"instances": [[5.1, 3.5, 1.4, 0.2]]}'

Next Steps

After successful setup, explore these key areas:

Model Publishing

📘 Complete Guide: See Model Publishing Guide

Use the Management Service to publish models for external access with rate limiting and authentication.

Architecture Understanding

🏗️ Technical Details: See Architecture Documentation

Learn about the dual-gateway design and multi-tenant security.

Advanced Usage

⚡ API Reference: See Management Service API

Explore the full REST API for programmatic model management.

Cleanup

# Complete platform teardown
./scripts/cleanup.sh

Troubleshooting

🔧 Complete Troubleshooting: For detailed troubleshooting steps, see README.md - Troubleshooting

Common verification commands:

# Check cluster health
kubectl get pods --all-namespaces | grep -v Running

# Verify AI Gateway
kubectl get pods -n envoy-gateway-system