Chapter 5: Production Deployment Package
- 📚 Course Home: AZD For Beginners
- 📖 Related Chapter: Chapter 5: Multi-Agent AI Solutions
- 📝 Scenario Guide: Complete Architecture
- 🎯 Quick Deploy: One-Click Deployment
⚠️ INFRASTRUCTURE TEMPLATE ONLY
This ARM template deploys Azure resources for a multi-agent system.What gets deployed (15-25 minutes):
- ✅ Microsoft Foundry Models (gpt-4.1, gpt-4.1-mini, embeddings across 3 regions)
- ✅ AI Search service (empty, ready for index creation)
- ✅ Container Apps (placeholder images, ready for your code)
- ✅ Storage, Cosmos DB, Key Vault, Application Insights
What's NOT included (requires development):
- ❌ Agent implementation code (Customer Agent, Inventory Agent)
- ❌ Routing logic and API endpoints
- ❌ Frontend chat UI
- ❌ Search index schemas and data pipelines
- ❌ Estimated development effort: 80-120 hours
Use this template if:
- ✅ You want to provision Azure infrastructure for a multi-agent project
- ✅ You plan to develop agent implementation separately
- ✅ You need a production-ready infrastructure baseline
Don't use if:
- ❌ You expect a working multi-agent demo immediately
- ❌ You're looking for complete application code examples
This directory contains a comprehensive Azure Resource Manager (ARM) template for deploying the infrastructure foundation of a multi-agent customer support system. The template provisions all necessary Azure services, properly configured and interconnected, ready for your application development.
After deployment, you'll have: Production-ready Azure infrastructure
To complete the system, you need: Agent code, frontend UI, and data configuration (see Architecture Guide)
✅ Microsoft Foundry Models Services (Ready for API calls)
- Primary region: gpt-4.1 deployment (20K TPM capacity)
- Secondary region: gpt-4.1-mini deployment (10K TPM capacity)
- Tertiary region: Text embeddings model (30K TPM capacity)
- Evaluation region: gpt-4.1 grader model (15K TPM capacity)
- Status: Fully functional - can make API calls immediately
✅ Azure AI Search (Empty - ready for configuration)
- Vector search capabilities enabled
- Standard tier with 1 partition, 1 replica
- Status: Service running, but requires index creation
- Action needed: Create search index with your schema
✅ Azure Storage Account (Empty - ready for uploads)
- Blob containers:
documents,uploads - Secure configuration (HTTPS-only, no public access)
- Status: Ready to receive files
- Action needed: Upload your product data and documents
- Agent router app (nginx default image)
- Frontend app (nginx default image)
- Auto-scaling configured (0-10 instances)
- Status: Running placeholder containers
- Action needed: Build and deploy your agent applications
✅ Azure Cosmos DB (Empty - ready for data)
- Database and container pre-configured
- Optimized for low-latency operations
- TTL enabled for automatic cleanup
- Status: Ready to store chat history
✅ Azure Key Vault (Optional - ready for secrets)
- Soft delete enabled
- RBAC configured for managed identities
- Status: Ready to store API keys and connection strings
✅ Application Insights (Optional - monitoring active)
- Connected to Log Analytics workspace
- Custom metrics and alerts configured
- Status: Ready to receive telemetry from your apps
✅ Document Intelligence (Ready for API calls)
- S0 tier for production workloads
- Status: Ready to process uploaded documents
✅ Bing Search API (Ready for API calls)
- S1 tier for real-time searches
- Status: Ready for web search queries
| Mode | OpenAI Capacity | Container Instances | Search Tier | Storage Redundancy | Best For |
|---|---|---|---|---|---|
| Minimal | 10K-20K TPM | 0-2 replicas | Basic | LRS (Local) | Dev/test, learning, proof-of-concept |
| Standard | 30K-60K TPM | 2-5 replicas | Standard | ZRS (Zone) | Production, moderate traffic (<10K users) |
| Premium | 80K-150K TPM | 5-10 replicas, zone-redundant | Premium | GRS (Geo) | Enterprise, high traffic (>10K users), 99.99% SLA |
Cost Impact:
- Minimal → Standard: ~4x cost increase ($100-370/mo → $420-1,450/mo)
- Standard → Premium: ~3x cost increase ($420-1,450/mo → $1,150-3,500/mo)
- Choose based on: Expected load, SLA requirements, budget constraints
Capacity Planning:
- TPM (Tokens Per Minute): Total across all model deployments
- Container Instances: Auto-scaling range (min-max replicas)
- Search Tier: Affects query performance and index size limits
-
Azure CLI (version 2.50.0 or higher)
az --version # Check version az login # Authenticate
-
Active Azure subscription with Owner or Contributor access
az account show # Verify subscription
Before deployment, verify sufficient quotas in your target regions:
# Check Microsoft Foundry Models availability in your region
az cognitiveservices account list-skus \
--kind OpenAI \
--location eastus2
# Verify OpenAI quota (example for gpt-4.1)
az cognitiveservices usage list \
--location eastus2 \
--query "[?name.value=='OpenAI.Standard.gpt-4.1']"
# Check Container Apps quota
az provider show \
--namespace Microsoft.App \
--query "resourceTypes[?resourceType=='managedEnvironments'].locations"Minimum Required Quotas:
- Microsoft Foundry Models: 3-4 model deployments across regions
- gpt-4.1: 20K TPM (Tokens Per Minute)
- gpt-4.1-mini: 10K TPM
- text-embedding-ada-002: 30K TPM
- Note: gpt-4.1 may have waitlist in some regions - check model availability
- Container Apps: Managed environment + 2-10 container instances
- AI Search: Standard tier (Basic insufficient for vector search)
- Cosmos DB: Standard provisioned throughput
If quota insufficient:
- Go to Azure Portal → Quotas → Request increase
- Or use Azure CLI:
az support tickets create \ --ticket-name "OpenAI-Quota-Increase" \ --severity "minimal" \ --description "Request quota increase for Microsoft Foundry Models gpt-4.1 in eastus2"
- Consider alternative regions with availability
# Clone or download the template files
git clone <repository-url>
cd examples/retail-multiagent-arm-template
# Make the deployment script executable
chmod +x deploy.sh
# Deploy with default settings
./deploy.sh -g myResourceGroup
# Deploy for production with premium features
./deploy.sh -g myProdRG -e prod -m premium -l eastus2# Create resource group
az group create --name myResourceGroup --location eastus2
# Deploy template
az deployment group create \
--resource-group myResourceGroup \
--template-file azuredeploy.json \
--parameters azuredeploy.parameters.json| Phase | Duration | What Happens | |-------|----------|--------------|| | Template Validation | 30-60 seconds | Azure validates ARM template syntax and parameters | | Resource Group Setup | 10-20 seconds | Creates resource group (if needed) | | OpenAI Provisioning | 5-8 minutes | Creates 3-4 OpenAI accounts and deploys models | | Container Apps | 3-5 minutes | Creates environment and deploys placeholder containers | | Search & Storage | 2-4 minutes | Provisions AI Search service and storage accounts | | Cosmos DB | 2-3 minutes | Creates database and configures containers | | Monitoring Setup | 2-3 minutes | Sets up Application Insights and Log Analytics | | RBAC Configuration | 1-2 minutes | Configures managed identities and permissions | | Total Deployment | 15-25 minutes | Complete infrastructure ready |
After Deployment:
- ✅ Infrastructure Ready: All Azure services provisioned and running
- ⏱️ Application Development: 80-120 hours (your responsibility)
- ⏱️ Index Configuration: 15-30 minutes (requires your schema)
- ⏱️ Data Upload: Varies by dataset size
- ⏱️ Testing & Validation: 2-4 hours
# Verify all resources deployed successfully
az resource list \
--resource-group myResourceGroup \
--query "[?provisioningState!='Succeeded'].{Name:name, Status:provisioningState, Type:type}" \
--output tableExpected: Empty table (all resources show "Succeeded" status)
# List all OpenAI accounts
az cognitiveservices account list \
--resource-group myResourceGroup \
--query "[?kind=='OpenAI'].{Name:name, Location:location, Status:properties.provisioningState}" \
--output table
# Check model deployments for primary region
OPENAI_NAME=$(az cognitiveservices account list \
--resource-group myResourceGroup \
--query "[?kind=='OpenAI'] | [0].name" -o tsv)
az cognitiveservices account deployment list \
--name $OPENAI_NAME \
--resource-group myResourceGroup \
--output tableExpected:
- 3-4 OpenAI accounts (primary, secondary, tertiary, evaluation regions)
- 1-2 model deployments per account (gpt-4.1, gpt-4.1-mini, text-embedding-ada-002)
# Get Container App URLs
az containerapp list \
--resource-group myResourceGroup \
--query "[].{Name:name, URL:properties.configuration.ingress.fqdn, Status:properties.runningStatus}" \
--output table
# Test router endpoint (placeholder image will respond)
ROUTER_URL=$(az containerapp show \
--name retail-router \
--resource-group myResourceGroup \
--query "properties.configuration.ingress.fqdn" -o tsv)
echo "Testing: https://$ROUTER_URL"
curl -I https://$ROUTER_URL || echo "Container running (placeholder image - expected)"Expected:
- Container Apps show "Running" status
- Placeholder nginx responds with HTTP 200 or 404 (no application code yet)
# Get OpenAI endpoint and key
OPENAI_ENDPOINT=$(az cognitiveservices account show \
--name $OPENAI_NAME \
--resource-group myResourceGroup \
--query "properties.endpoint" -o tsv)
OPENAI_KEY=$(az cognitiveservices account keys list \
--name $OPENAI_NAME \
--resource-group myResourceGroup \
--query "key1" -o tsv)
# Test gpt-4.1 deployment
curl "${OPENAI_ENDPOINT}openai/deployments/gpt-4.1/chat/completions?api-version=2024-08-01-preview" \
-H "Content-Type: application/json" \
-H "api-key: $OPENAI_KEY" \
-d '{
"messages": [{"role": "user", "content": "Say hello"}],
"max_tokens": 10
}'Expected: JSON response with chat completion (confirms OpenAI is functional)
✅ Working After Deployment:
- Microsoft Foundry Models models deployed and accepting API calls
- AI Search service running (empty, no indexes yet)
- Container Apps running (placeholder nginx images)
- Storage accounts accessible and ready for uploads
- Cosmos DB ready for data operations
- Application Insights collecting infrastructure telemetry
- Key Vault ready for secret storage
❌ Not Working Yet (Requires Development):
- Agent endpoints (no application code deployed)
- Chat functionality (requires frontend + backend implementation)
- Search queries (no search index created yet)
- Document processing pipeline (no data uploaded)
- Custom telemetry (requires application instrumentation)
Next Steps: See Post-Deployment Configuration to develop and deploy your application
| Parameter | Type | Default | Description |
|---|---|---|---|
projectName |
string | "retail" | Prefix for all resource names |
location |
string | Resource group location | Primary deployment region |
secondaryLocation |
string | "westus2" | Secondary region for multi-region deployment |
tertiaryLocation |
string | "francecentral" | Region for embeddings model |
environmentName |
string | "dev" | Environment designation (dev/staging/prod) |
deploymentMode |
string | "standard" | Deployment configuration (minimal/standard/premium) |
enableMultiRegion |
bool | true | Enable multi-region deployment |
enableMonitoring |
bool | true | Enable Application Insights and logging |
enableSecurity |
bool | true | Enable Key Vault and enhanced security |
Edit azuredeploy.parameters.json:
{
"$schema": "https://schema.management.azure.com/schemas/2019-04-01/deploymentParameters.json#",
"contentVersion": "1.0.0.0",
"parameters": {
"projectName": {
"value": "mycompany"
},
"environmentName": {
"value": "prod"
},
"deploymentMode": {
"value": "premium"
},
"location": {
"value": "eastus2"
}
}
}graph TD
Frontend[Frontend<br/>Container App] --> Router[Agent Router<br/>Container App] --> Agents[Agents<br/>Customer + Inv]
Router --> Search[AI Search<br/>Vector DB]
Router --> Models[Microsoft Foundry Models<br/>Multi-region]
Agents --> Storage[Storage<br/>Documents]
Search --> CosmosDB[Cosmos DB<br/>Chat History]
Models --> AppInsights[App Insights<br/>Monitoring]
Storage --> KeyVault[Key Vault<br/>Secrets]
The deploy.sh script provides an interactive deployment experience:
# Show help
./deploy.sh --help
# Basic deployment
./deploy.sh -g myResourceGroup
# Advanced deployment with custom settings
./deploy.sh \
-g myProductionRG \
-p companyname \
-e prod \
-m premium \
-l eastus2
# Development deployment without multi-region
./deploy.sh \
-g myDevRG \
-e dev \
-m minimal \
--no-multi-region \
--no-security- ✅ Prerequisites validation (Azure CLI, login status, template files)
- ✅ Resource group management (creates if doesn't exist)
- ✅ Template validation before deployment
- ✅ Progress monitoring with colored output
- ✅ Deployment outputs display
- ✅ Post-deployment guidance
# List deployments
az deployment group list --resource-group myResourceGroup --output table
# Get deployment details
az deployment group show \
--resource-group myResourceGroup \
--name retail-deployment-YYYYMMDD-HHMMSS
# Watch deployment progress
az deployment group create \
--resource-group myResourceGroup \
--template-file azuredeploy.json \
--parameters azuredeploy.parameters.json \
--verboseAfter successful deployment, the following outputs are available:
- Frontend URL: Public endpoint for the web interface
- Router URL: API endpoint for the agent router
- OpenAI Endpoints: Primary and secondary OpenAI service endpoints
- Search Service: Azure AI Search service endpoint
- Storage Account: Name of the storage account for documents
- Key Vault: Name of the Key Vault (if enabled)
- Application Insights: Name of the monitoring service (if enabled)
📝 Important: Infrastructure is deployed, but you need to develop and deploy application code.
The ARM template creates empty Container Apps with placeholder nginx images. You must:
Required Development:
-
Agent Implementation (30-40 hours)
- Customer service agent with gpt-4.1 integration
- Inventory agent with gpt-4.1-mini integration
- Agent routing logic
-
Frontend Development (20-30 hours)
- Chat interface UI (React/Vue/Angular)
- File upload functionality
- Response rendering and formatting
-
Backend Services (12-16 hours)
- FastAPI or Express router
- Authentication middleware
- Telemetry integration
See: Architecture Guide for detailed implementation patterns and code examples
Create a search index matching your data model:
# Get search service details
SEARCH_NAME=$(az search service list \
--resource-group myResourceGroup \
--query "[0].name" -o tsv)
SEARCH_KEY=$(az search admin-key show \
--service-name $SEARCH_NAME \
--resource-group myResourceGroup \
--query "primaryKey" -o tsv)
# Create index with your schema (example)
curl -X POST "https://${SEARCH_NAME}.search.windows.net/indexes?api-version=2023-11-01" \
-H "Content-Type: application/json" \
-H "api-key: ${SEARCH_KEY}" \
-d '{
"name": "products",
"fields": [
{"name": "id", "type": "Edm.String", "key": true},
{"name": "title", "type": "Edm.String", "searchable": true},
{"name": "content", "type": "Edm.String", "searchable": true},
{"name": "category", "type": "Edm.String", "filterable": true},
{"name": "content_vector", "type": "Collection(Edm.Single)",
"searchable": true, "dimensions": 1536, "vectorSearchProfile": "default"}
],
"vectorSearch": {
"algorithms": [{"name": "default", "kind": "hnsw"}],
"profiles": [{"name": "default", "algorithm": "default"}]
}
}'Resources:
Once you have product data and documents:
# Get storage account details
STORAGE_NAME=$(az storage account list \
--resource-group myResourceGroup \
--query "[0].name" -o tsv)
STORAGE_KEY=$(az storage account keys list \
--account-name $STORAGE_NAME \
--resource-group myResourceGroup \
--query "[0].value" -o tsv)
# Upload your documents
az storage blob upload-batch \
--destination documents \
--source /path/to/your/product/docs \
--account-name $STORAGE_NAME \
--account-key $STORAGE_KEY
# Example: Upload single file
az storage blob upload \
--container-name documents \
--name "product-manual.pdf" \
--file /path/to/product-manual.pdf \
--account-name $STORAGE_NAME \
--account-key $STORAGE_KEYOnce you've developed your agent code:
# 1. Create Azure Container Registry (if needed)
az acr create \
--name myregistry \
--resource-group myResourceGroup \
--sku Basic
# 2. Build and push agent router image
docker build -t myregistry.azurecr.io/agent-router:v1 /path/to/your/router/code
az acr login --name myregistry
docker push myregistry.azurecr.io/agent-router:v1
# 3. Build and push frontend image
docker build -t myregistry.azurecr.io/frontend:v1 /path/to/your/frontend/code
docker push myregistry.azurecr.io/frontend:v1
# 4. Update Container Apps with your images
az containerapp update \
--name retail-router \
--resource-group myResourceGroup \
--image myregistry.azurecr.io/agent-router:v1
az containerapp update \
--name retail-frontend \
--resource-group myResourceGroup \
--image myregistry.azurecr.io/frontend:v1
# 5. Configure environment variables
az containerapp update \
--name retail-router \
--resource-group myResourceGroup \
--set-env-vars \
OPENAI_ENDPOINT=secretref:openai-endpoint \
OPENAI_KEY=secretref:openai-key \
SEARCH_ENDPOINT=secretref:search-endpoint \
SEARCH_KEY=secretref:search-key# Get your application URL
ROUTER_URL=$(az containerapp show \
--name retail-router \
--resource-group myResourceGroup \
--query "properties.configuration.ingress.fqdn" -o tsv)
# Test agent endpoint (once your code is deployed)
curl -X POST "https://${ROUTER_URL}/chat" \
-H "Content-Type: application/json" \
-d '{
"message": "Hello, I need help with my order",
"agent": "customer"
}'
# Check application logs
az containerapp logs show \
--name retail-router \
--resource-group myResourceGroup \
--followArchitecture & Design:
- 📖 Complete Architecture Guide - Detailed implementation patterns
- 📖 Multi-Agent Design Patterns
Code Examples:
- 🔗 Microsoft Foundry Models Chat Sample - RAG pattern
- 🔗 Semantic Kernel - Agent framework (C#)
- 🔗 LangChain Azure - Agent orchestration (Python)
- 🔗 AutoGen - Multi-agent conversations
Estimated Total Effort:
- Infrastructure deployment: 15-25 minutes (✅ Complete)
- Application development: 80-120 hours (🔨 Your work)
- Testing and optimization: 15-25 hours (🔨 Your work)
# Check current quota usage
az cognitiveservices usage list --location eastus2
# Request quota increase
az support tickets create \
--ticket-name "OpenAI-Quota-Increase" \
--severity "minimal" \
--description "Request quota increase for Microsoft Foundry Models in region X"# Check container app logs
az containerapp logs show \
--name retail-router \
--resource-group myResourceGroup \
--follow
# Restart container app
az containerapp revision restart \
--name retail-router \
--resource-group myResourceGroup# Verify search service status
az search service show \
--name <search-service-name> \
--resource-group myResourceGroup
# Test search service connectivity
curl -X GET "https://<search-service-name>.search.windows.net/indexes?api-version=2023-11-01" \
-H "api-key: <search-admin-key>"# Validate all resources are created
az resource list \
--resource-group myResourceGroup \
--output table
# Check resource health
az resource list \
--resource-group myResourceGroup \
--query "[?provisioningState!='Succeeded'].{Name:name, Status:provisioningState, Type:type}" \
--output table- All secrets are stored in Azure Key Vault (when enabled)
- Container apps use managed identity for authentication
- Storage accounts have secure defaults (HTTPS only, no public blob access)
- Container apps use internal networking where possible
- Search service configured with private endpoints option
- Cosmos DB configured with minimal necessary permissions
# Assign necessary roles for managed identity
az role assignment create \
--assignee <container-app-managed-identity> \
--role "Cognitive Services OpenAI User" \
--scope <openai-resource-id>| Mode | OpenAI | Container Apps | Search | Storage | Total Est. |
|---|---|---|---|---|---|
| Minimal | $50-200 | $20-50 | $25-100 | $5-20 | $100-370 |
| Standard | $200-800 | $100-300 | $100-300 | $20-50 | $420-1450 |
| Premium | $500-2000 | $300-800 | $300-600 | $50-100 | $1150-3500 |
# Set up budget alerts
az consumption budget create \
--account-name <subscription-id> \
--budget-name "retail-budget" \
--amount 500 \
--time-grain Monthly \
--start-date 2024-01-01 \
--end-date 2024-12-31- Version control the ARM template files
- Test changes in development environment first
- Use incremental deployment mode for updates
# Update with new parameters
az deployment group create \
--resource-group myResourceGroup \
--template-file azuredeploy.json \
--parameters azuredeploy.parameters.json \
--mode Incremental- Cosmos DB automatic backup enabled
- Key Vault soft delete enabled
- Container app revisions maintained for rollback
- Template Issues: GitHub Issues
- Azure Support: Azure Support Portal
- Community: Azure AI Discord
⚡ Ready to deploy your multi-agent solution?
Start with: ./deploy.sh -g myResourceGroup