Skip to content

Latest commit

 

History

History

Folders and files

NameName
Last commit message
Last commit date

parent directory

..
 
 
 
 
 
 
 
 

README.md

Retail Multi-Agent Solution - Infrastructure Template

Chapter 5: Production Deployment Package

⚠️ INFRASTRUCTURE TEMPLATE ONLY
This ARM template deploys Azure resources for a multi-agent system.

What gets deployed (15-25 minutes):

  • ✅ Microsoft Foundry Models (gpt-4.1, gpt-4.1-mini, embeddings across 3 regions)
  • ✅ AI Search service (empty, ready for index creation)
  • ✅ Container Apps (placeholder images, ready for your code)
  • ✅ Storage, Cosmos DB, Key Vault, Application Insights

What's NOT included (requires development):

  • ❌ Agent implementation code (Customer Agent, Inventory Agent)
  • ❌ Routing logic and API endpoints
  • ❌ Frontend chat UI
  • ❌ Search index schemas and data pipelines
  • Estimated development effort: 80-120 hours

Use this template if:

  • ✅ You want to provision Azure infrastructure for a multi-agent project
  • ✅ You plan to develop agent implementation separately
  • ✅ You need a production-ready infrastructure baseline

Don't use if:

  • ❌ You expect a working multi-agent demo immediately
  • ❌ You're looking for complete application code examples

Overview

This directory contains a comprehensive Azure Resource Manager (ARM) template for deploying the infrastructure foundation of a multi-agent customer support system. The template provisions all necessary Azure services, properly configured and interconnected, ready for your application development.

After deployment, you'll have: Production-ready Azure infrastructure
To complete the system, you need: Agent code, frontend UI, and data configuration (see Architecture Guide)

🎯 What Gets Deployed

Core Infrastructure (Status After Deployment)

Microsoft Foundry Models Services (Ready for API calls)

  • Primary region: gpt-4.1 deployment (20K TPM capacity)
  • Secondary region: gpt-4.1-mini deployment (10K TPM capacity)
  • Tertiary region: Text embeddings model (30K TPM capacity)
  • Evaluation region: gpt-4.1 grader model (15K TPM capacity)
  • Status: Fully functional - can make API calls immediately

Azure AI Search (Empty - ready for configuration)

  • Vector search capabilities enabled
  • Standard tier with 1 partition, 1 replica
  • Status: Service running, but requires index creation
  • Action needed: Create search index with your schema

Azure Storage Account (Empty - ready for uploads)

  • Blob containers: documents, uploads
  • Secure configuration (HTTPS-only, no public access)
  • Status: Ready to receive files
  • Action needed: Upload your product data and documents

⚠️ Container Apps Environment (Placeholder images deployed)

  • Agent router app (nginx default image)
  • Frontend app (nginx default image)
  • Auto-scaling configured (0-10 instances)
  • Status: Running placeholder containers
  • Action needed: Build and deploy your agent applications

Azure Cosmos DB (Empty - ready for data)

  • Database and container pre-configured
  • Optimized for low-latency operations
  • TTL enabled for automatic cleanup
  • Status: Ready to store chat history

Azure Key Vault (Optional - ready for secrets)

  • Soft delete enabled
  • RBAC configured for managed identities
  • Status: Ready to store API keys and connection strings

Application Insights (Optional - monitoring active)

  • Connected to Log Analytics workspace
  • Custom metrics and alerts configured
  • Status: Ready to receive telemetry from your apps

Document Intelligence (Ready for API calls)

  • S0 tier for production workloads
  • Status: Ready to process uploaded documents

Bing Search API (Ready for API calls)

  • S1 tier for real-time searches
  • Status: Ready for web search queries

Deployment Modes

Mode OpenAI Capacity Container Instances Search Tier Storage Redundancy Best For
Minimal 10K-20K TPM 0-2 replicas Basic LRS (Local) Dev/test, learning, proof-of-concept
Standard 30K-60K TPM 2-5 replicas Standard ZRS (Zone) Production, moderate traffic (<10K users)
Premium 80K-150K TPM 5-10 replicas, zone-redundant Premium GRS (Geo) Enterprise, high traffic (>10K users), 99.99% SLA

Cost Impact:

  • Minimal → Standard: ~4x cost increase ($100-370/mo → $420-1,450/mo)
  • Standard → Premium: ~3x cost increase ($420-1,450/mo → $1,150-3,500/mo)
  • Choose based on: Expected load, SLA requirements, budget constraints

Capacity Planning:

  • TPM (Tokens Per Minute): Total across all model deployments
  • Container Instances: Auto-scaling range (min-max replicas)
  • Search Tier: Affects query performance and index size limits

📋 Prerequisites

Required Tools

  1. Azure CLI (version 2.50.0 or higher)

    az --version  # Check version
    az login      # Authenticate
  2. Active Azure subscription with Owner or Contributor access

    az account show  # Verify subscription

Required Azure Quotas

Before deployment, verify sufficient quotas in your target regions:

# Check Microsoft Foundry Models availability in your region
az cognitiveservices account list-skus \
  --kind OpenAI \
  --location eastus2

# Verify OpenAI quota (example for gpt-4.1)
az cognitiveservices usage list \
  --location eastus2 \
  --query "[?name.value=='OpenAI.Standard.gpt-4.1']"

# Check Container Apps quota
az provider show \
  --namespace Microsoft.App \
  --query "resourceTypes[?resourceType=='managedEnvironments'].locations"

Minimum Required Quotas:

  • Microsoft Foundry Models: 3-4 model deployments across regions
    • gpt-4.1: 20K TPM (Tokens Per Minute)
    • gpt-4.1-mini: 10K TPM
    • text-embedding-ada-002: 30K TPM
    • Note: gpt-4.1 may have waitlist in some regions - check model availability
  • Container Apps: Managed environment + 2-10 container instances
  • AI Search: Standard tier (Basic insufficient for vector search)
  • Cosmos DB: Standard provisioned throughput

If quota insufficient:

  1. Go to Azure Portal → Quotas → Request increase
  2. Or use Azure CLI:
    az support tickets create \
      --ticket-name "OpenAI-Quota-Increase" \
      --severity "minimal" \
      --description "Request quota increase for Microsoft Foundry Models gpt-4.1 in eastus2"
  3. Consider alternative regions with availability

🚀 Quick Deployment

Option 1: Using Azure CLI

# Clone or download the template files
git clone <repository-url>
cd examples/retail-multiagent-arm-template

# Make the deployment script executable
chmod +x deploy.sh

# Deploy with default settings
./deploy.sh -g myResourceGroup

# Deploy for production with premium features
./deploy.sh -g myProdRG -e prod -m premium -l eastus2

Option 2: Using Azure Portal

Deploy to Azure

Option 3: Using Azure CLI directly

# Create resource group
az group create --name myResourceGroup --location eastus2

# Deploy template
az deployment group create \
  --resource-group myResourceGroup \
  --template-file azuredeploy.json \
  --parameters azuredeploy.parameters.json

⏱️ Deployment Timeline

What to Expect

| Phase | Duration | What Happens | |-------|----------|--------------|| | Template Validation | 30-60 seconds | Azure validates ARM template syntax and parameters | | Resource Group Setup | 10-20 seconds | Creates resource group (if needed) | | OpenAI Provisioning | 5-8 minutes | Creates 3-4 OpenAI accounts and deploys models | | Container Apps | 3-5 minutes | Creates environment and deploys placeholder containers | | Search & Storage | 2-4 minutes | Provisions AI Search service and storage accounts | | Cosmos DB | 2-3 minutes | Creates database and configures containers | | Monitoring Setup | 2-3 minutes | Sets up Application Insights and Log Analytics | | RBAC Configuration | 1-2 minutes | Configures managed identities and permissions | | Total Deployment | 15-25 minutes | Complete infrastructure ready |

After Deployment:

  • Infrastructure Ready: All Azure services provisioned and running
  • ⏱️ Application Development: 80-120 hours (your responsibility)
  • ⏱️ Index Configuration: 15-30 minutes (requires your schema)
  • ⏱️ Data Upload: Varies by dataset size
  • ⏱️ Testing & Validation: 2-4 hours

✅ Verify Deployment Success

Step 1: Check Resource Provisioning (2 minutes)

# Verify all resources deployed successfully
az resource list \
  --resource-group myResourceGroup \
  --query "[?provisioningState!='Succeeded'].{Name:name, Status:provisioningState, Type:type}" \
  --output table

Expected: Empty table (all resources show "Succeeded" status)

Step 2: Verify Microsoft Foundry Models Deployments (3 minutes)

# List all OpenAI accounts
az cognitiveservices account list \
  --resource-group myResourceGroup \
  --query "[?kind=='OpenAI'].{Name:name, Location:location, Status:properties.provisioningState}" \
  --output table

# Check model deployments for primary region
OPENAI_NAME=$(az cognitiveservices account list \
  --resource-group myResourceGroup \
  --query "[?kind=='OpenAI'] | [0].name" -o tsv)

az cognitiveservices account deployment list \
  --name $OPENAI_NAME \
  --resource-group myResourceGroup \
  --output table

Expected:

  • 3-4 OpenAI accounts (primary, secondary, tertiary, evaluation regions)
  • 1-2 model deployments per account (gpt-4.1, gpt-4.1-mini, text-embedding-ada-002)

Step 3: Test Infrastructure Endpoints (5 minutes)

# Get Container App URLs
az containerapp list \
  --resource-group myResourceGroup \
  --query "[].{Name:name, URL:properties.configuration.ingress.fqdn, Status:properties.runningStatus}" \
  --output table

# Test router endpoint (placeholder image will respond)
ROUTER_URL=$(az containerapp show \
  --name retail-router \
  --resource-group myResourceGroup \
  --query "properties.configuration.ingress.fqdn" -o tsv)

echo "Testing: https://$ROUTER_URL"
curl -I https://$ROUTER_URL || echo "Container running (placeholder image - expected)"

Expected:

  • Container Apps show "Running" status
  • Placeholder nginx responds with HTTP 200 or 404 (no application code yet)

Step 4: Verify Microsoft Foundry Models API Access (3 minutes)

# Get OpenAI endpoint and key
OPENAI_ENDPOINT=$(az cognitiveservices account show \
  --name $OPENAI_NAME \
  --resource-group myResourceGroup \
  --query "properties.endpoint" -o tsv)

OPENAI_KEY=$(az cognitiveservices account keys list \
  --name $OPENAI_NAME \
  --resource-group myResourceGroup \
  --query "key1" -o tsv)

# Test gpt-4.1 deployment
curl "${OPENAI_ENDPOINT}openai/deployments/gpt-4.1/chat/completions?api-version=2024-08-01-preview" \
  -H "Content-Type: application/json" \
  -H "api-key: $OPENAI_KEY" \
  -d '{
    "messages": [{"role": "user", "content": "Say hello"}],
    "max_tokens": 10
  }'

Expected: JSON response with chat completion (confirms OpenAI is functional)

What's Working vs. What's Not

✅ Working After Deployment:

  • Microsoft Foundry Models models deployed and accepting API calls
  • AI Search service running (empty, no indexes yet)
  • Container Apps running (placeholder nginx images)
  • Storage accounts accessible and ready for uploads
  • Cosmos DB ready for data operations
  • Application Insights collecting infrastructure telemetry
  • Key Vault ready for secret storage

❌ Not Working Yet (Requires Development):

  • Agent endpoints (no application code deployed)
  • Chat functionality (requires frontend + backend implementation)
  • Search queries (no search index created yet)
  • Document processing pipeline (no data uploaded)
  • Custom telemetry (requires application instrumentation)

Next Steps: See Post-Deployment Configuration to develop and deploy your application


⚙️ Configuration Options

Template Parameters

Parameter Type Default Description
projectName string "retail" Prefix for all resource names
location string Resource group location Primary deployment region
secondaryLocation string "westus2" Secondary region for multi-region deployment
tertiaryLocation string "francecentral" Region for embeddings model
environmentName string "dev" Environment designation (dev/staging/prod)
deploymentMode string "standard" Deployment configuration (minimal/standard/premium)
enableMultiRegion bool true Enable multi-region deployment
enableMonitoring bool true Enable Application Insights and logging
enableSecurity bool true Enable Key Vault and enhanced security

Customizing Parameters

Edit azuredeploy.parameters.json:

{
  "$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
  "contentVersion": "1.0.0.0",
  "parameters": {
    "projectName": {
      "value": "mycompany"
    },
    "environmentName": {
      "value": "prod"
    },
    "deploymentMode": {
      "value": "premium"
    },
    "location": {
      "value": "eastus2"
    }
  }
}

🏗️ Architecture Overview

graph TD
    Frontend[Frontend<br/>Container App] --> Router[Agent Router<br/>Container App] --> Agents[Agents<br/>Customer + Inv]
    Router --> Search[AI Search<br/>Vector DB]
    Router --> Models[Microsoft Foundry Models<br/>Multi-region]
    Agents --> Storage[Storage<br/>Documents]
    Search --> CosmosDB[Cosmos DB<br/>Chat History]
    Models --> AppInsights[App Insights<br/>Monitoring]
    Storage --> KeyVault[Key Vault<br/>Secrets]
Loading

📖 Deployment Script Usage

The deploy.sh script provides an interactive deployment experience:

# Show help
./deploy.sh --help

# Basic deployment
./deploy.sh -g myResourceGroup

# Advanced deployment with custom settings
./deploy.sh \
  -g myProductionRG \
  -p companyname \
  -e prod \
  -m premium \
  -l eastus2

# Development deployment without multi-region
./deploy.sh \
  -g myDevRG \
  -e dev \
  -m minimal \
  --no-multi-region \
  --no-security

Script Features

  • Prerequisites validation (Azure CLI, login status, template files)
  • Resource group management (creates if doesn't exist)
  • Template validation before deployment
  • Progress monitoring with colored output
  • Deployment outputs display
  • Post-deployment guidance

📊 Monitoring Deployment

Check Deployment Status

# List deployments
az deployment group list --resource-group myResourceGroup --output table

# Get deployment details
az deployment group show \
  --resource-group myResourceGroup \
  --name retail-deployment-YYYYMMDD-HHMMSS

# Watch deployment progress
az deployment group create \
  --resource-group myResourceGroup \
  --template-file azuredeploy.json \
  --parameters azuredeploy.parameters.json \
  --verbose

Deployment Outputs

After successful deployment, the following outputs are available:

  • Frontend URL: Public endpoint for the web interface
  • Router URL: API endpoint for the agent router
  • OpenAI Endpoints: Primary and secondary OpenAI service endpoints
  • Search Service: Azure AI Search service endpoint
  • Storage Account: Name of the storage account for documents
  • Key Vault: Name of the Key Vault (if enabled)
  • Application Insights: Name of the monitoring service (if enabled)

🔧 Post-Deployment: Next Steps

📝 Important: Infrastructure is deployed, but you need to develop and deploy application code.

Phase 1: Develop Agent Applications (Your Responsibility)

The ARM template creates empty Container Apps with placeholder nginx images. You must:

Required Development:

  1. Agent Implementation (30-40 hours)

    • Customer service agent with gpt-4.1 integration
    • Inventory agent with gpt-4.1-mini integration
    • Agent routing logic
  2. Frontend Development (20-30 hours)

    • Chat interface UI (React/Vue/Angular)
    • File upload functionality
    • Response rendering and formatting
  3. Backend Services (12-16 hours)

    • FastAPI or Express router
    • Authentication middleware
    • Telemetry integration

See: Architecture Guide for detailed implementation patterns and code examples

Phase 2: Configure AI Search Index (15-30 minutes)

Create a search index matching your data model:

# Get search service details
SEARCH_NAME=$(az search service list \
  --resource-group myResourceGroup \
  --query "[0].name" -o tsv)

SEARCH_KEY=$(az search admin-key show \
  --service-name $SEARCH_NAME \
  --resource-group myResourceGroup \
  --query "primaryKey" -o tsv)

# Create index with your schema (example)
curl -X POST "https://${SEARCH_NAME}.search.windows.net/indexes?api-version=2023-11-01" \
  -H "Content-Type: application/json" \
  -H "api-key: ${SEARCH_KEY}" \
  -d '{
    "name": "products",
    "fields": [
      {"name": "id", "type": "Edm.String", "key": true},
      {"name": "title", "type": "Edm.String", "searchable": true},
      {"name": "content", "type": "Edm.String", "searchable": true},
      {"name": "category", "type": "Edm.String", "filterable": true},
      {"name": "content_vector", "type": "Collection(Edm.Single)", 
       "searchable": true, "dimensions": 1536, "vectorSearchProfile": "default"}
    ],
    "vectorSearch": {
      "algorithms": [{"name": "default", "kind": "hnsw"}],
      "profiles": [{"name": "default", "algorithm": "default"}]
    }
  }'

Resources:

Phase 3: Upload Your Data (Time varies)

Once you have product data and documents:

# Get storage account details
STORAGE_NAME=$(az storage account list \
  --resource-group myResourceGroup \
  --query "[0].name" -o tsv)

STORAGE_KEY=$(az storage account keys list \
  --account-name $STORAGE_NAME \
  --resource-group myResourceGroup \
  --query "[0].value" -o tsv)

# Upload your documents
az storage blob upload-batch \
  --destination documents \
  --source /path/to/your/product/docs \
  --account-name $STORAGE_NAME \
  --account-key $STORAGE_KEY

# Example: Upload single file
az storage blob upload \
  --container-name documents \
  --name "product-manual.pdf" \
  --file /path/to/product-manual.pdf \
  --account-name $STORAGE_NAME \
  --account-key $STORAGE_KEY

Phase 4: Build and Deploy Your Applications (8-12 hours)

Once you've developed your agent code:

# 1. Create Azure Container Registry (if needed)
az acr create \
  --name myregistry \
  --resource-group myResourceGroup \
  --sku Basic

# 2. Build and push agent router image
docker build -t myregistry.azurecr.io/agent-router:v1 /path/to/your/router/code
az acr login --name myregistry
docker push myregistry.azurecr.io/agent-router:v1

# 3. Build and push frontend image
docker build -t myregistry.azurecr.io/frontend:v1 /path/to/your/frontend/code
docker push myregistry.azurecr.io/frontend:v1

# 4. Update Container Apps with your images
az containerapp update \
  --name retail-router \
  --resource-group myResourceGroup \
  --image myregistry.azurecr.io/agent-router:v1

az containerapp update \
  --name retail-frontend \
  --resource-group myResourceGroup \
  --image myregistry.azurecr.io/frontend:v1

# 5. Configure environment variables
az containerapp update \
  --name retail-router \
  --resource-group myResourceGroup \
  --set-env-vars \
    OPENAI_ENDPOINT=secretref:openai-endpoint \
    OPENAI_KEY=secretref:openai-key \
    SEARCH_ENDPOINT=secretref:search-endpoint \
    SEARCH_KEY=secretref:search-key

Phase 5: Test Your Application (2-4 hours)

# Get your application URL
ROUTER_URL=$(az containerapp show \
  --name retail-router \
  --resource-group myResourceGroup \
  --query "properties.configuration.ingress.fqdn" -o tsv)

# Test agent endpoint (once your code is deployed)
curl -X POST "https://${ROUTER_URL}/chat" \
  -H "Content-Type: application/json" \
  -d '{
    "message": "Hello, I need help with my order",
    "agent": "customer"
  }'

# Check application logs
az containerapp logs show \
  --name retail-router \
  --resource-group myResourceGroup \
  --follow

Implementation Resources

Architecture & Design:

Code Examples:

Estimated Total Effort:

  • Infrastructure deployment: 15-25 minutes (✅ Complete)
  • Application development: 80-120 hours (🔨 Your work)
  • Testing and optimization: 15-25 hours (🔨 Your work)

🛠️ Troubleshooting

Common Issues

1. Microsoft Foundry Models Quota Exceeded

# Check current quota usage
az cognitiveservices usage list --location eastus2

# Request quota increase
az support tickets create \
  --ticket-name "OpenAI-Quota-Increase" \
  --severity "minimal" \
  --description "Request quota increase for Microsoft Foundry Models in region X"

2. Container Apps Deployment Failed

# Check container app logs
az containerapp logs show \
  --name retail-router \
  --resource-group myResourceGroup \
  --follow

# Restart container app
az containerapp revision restart \
  --name retail-router \
  --resource-group myResourceGroup

3. Search Service Initialization

# Verify search service status
az search service show \
  --name <search-service-name> \
  --resource-group myResourceGroup

# Test search service connectivity
curl -X GET "https://<search-service-name>.search.windows.net/indexes?api-version=2023-11-01" \
  -H "api-key: <search-admin-key>"

Deployment Validation

# Validate all resources are created
az resource list \
  --resource-group myResourceGroup \
  --output table

# Check resource health
az resource list \
  --resource-group myResourceGroup \
  --query "[?provisioningState!='Succeeded'].{Name:name, Status:provisioningState, Type:type}" \
  --output table

🔐 Security Considerations

Key Management

  • All secrets are stored in Azure Key Vault (when enabled)
  • Container apps use managed identity for authentication
  • Storage accounts have secure defaults (HTTPS only, no public blob access)

Network Security

  • Container apps use internal networking where possible
  • Search service configured with private endpoints option
  • Cosmos DB configured with minimal necessary permissions

RBAC Configuration

# Assign necessary roles for managed identity
az role assignment create \
  --assignee <container-app-managed-identity> \
  --role "Cognitive Services OpenAI User" \
  --scope <openai-resource-id>

💰 Cost Optimization

Cost Estimates (Monthly, USD)

Mode OpenAI Container Apps Search Storage Total Est.
Minimal $50-200 $20-50 $25-100 $5-20 $100-370
Standard $200-800 $100-300 $100-300 $20-50 $420-1450
Premium $500-2000 $300-800 $300-600 $50-100 $1150-3500

Cost Monitoring

# Set up budget alerts
az consumption budget create \
  --account-name <subscription-id> \
  --budget-name "retail-budget" \
  --amount 500 \
  --time-grain Monthly \
  --start-date 2024-01-01 \
  --end-date 2024-12-31

🔄 Updates and Maintenance

Template Updates

  • Version control the ARM template files
  • Test changes in development environment first
  • Use incremental deployment mode for updates

Resource Updates

# Update with new parameters
az deployment group create \
  --resource-group myResourceGroup \
  --template-file azuredeploy.json \
  --parameters azuredeploy.parameters.json \
  --mode Incremental

Backup and Recovery

  • Cosmos DB automatic backup enabled
  • Key Vault soft delete enabled
  • Container app revisions maintained for rollback

📞 Support


⚡ Ready to deploy your multi-agent solution?

Start with: ./deploy.sh -g myResourceGroup