Skip to content

Latest commit

 

History

History
1156 lines (975 loc) · 32.7 KB

File metadata and controls

1156 lines (975 loc) · 32.7 KB

Production AI Workload Best Practices with AZD

Chapter Navigation:

Overview

This guide provides comprehensive best practices for deploying production-ready AI workloads using Azure Developer CLI (AZD). Based on feedback from the Microsoft Foundry Discord community and real-world customer deployments, these practices address the most common challenges in production AI systems.

Key Challenges Addressed

Based on our community poll results, these are the top challenges developers face:

  • 45% struggle with multi-service AI deployments
  • 38% have issues with credential and secret management
  • 35% find production readiness and scaling difficult
  • 32% need better cost optimization strategies
  • 29% require improved monitoring and troubleshooting

Architecture Patterns for Production AI

Pattern 1: Microservices AI Architecture

When to use: Complex AI applications with multiple capabilities

graph TD
    Frontend[Web Frontend] --- Gateway[API Gateway] --- LB[Load Balancer]
    Gateway --> Chat[Chat Service]
    Gateway --> Image[Image Service]
    Gateway --> Text[Text Service]
    Chat --> OpenAI[Microsoft Foundry Models]
    Image --> Vision[Computer Vision]
    Text --> DocIntel[Document Intelligence]
Loading

AZD Implementation:

# azure.yaml
name: enterprise-ai-platform
services:
  web:
    project: ./web
    host: staticwebapp
  api-gateway:
    project: ./api-gateway
    host: containerapp
  chat-service:
    project: ./services/chat
    host: containerapp
  vision-service:
    project: ./services/vision
    host: containerapp
  text-service:
    project: ./services/text
    host: containerapp

Pattern 2: Event-Driven AI Processing

When to use: Batch processing, document analysis, async workflows

// Event Hub for AI processing pipeline
resource eventHub 'Microsoft.EventHub/namespaces@2023-01-01-preview' = {
  name: eventHubNamespaceName
  location: location
  sku: {
    name: 'Standard'
    tier: 'Standard'
    capacity: 1
  }
}

// Service Bus for reliable message processing
resource serviceBus 'Microsoft.ServiceBus/namespaces@2022-10-01-preview' = {
  name: serviceBusNamespaceName
  location: location
  sku: {
    name: 'Premium'
    tier: 'Premium'
    capacity: 1
  }
}

// Function App for processing
resource functionApp 'Microsoft.Web/sites@2023-01-01' = {
  name: functionAppName
  location: location
  kind: 'functionapp,linux'
  properties: {
    siteConfig: {
      appSettings: [
        {
          name: 'FUNCTIONS_EXTENSION_VERSION'
          value: '~4'
        }
        {
          name: 'AZURE_OPENAI_ENDPOINT'
          value: '@Microsoft.KeyVault(VaultName=${keyVault.name};SecretName=openai-endpoint)'
        }
      ]
    }
  }
}

Thinking About AI Agent Health

When a traditional web app breaks, the symptoms are familiar: a page doesn't load, an API returns an error, or a deployment fails. AI-powered applications can break in all those same ways—but they can also misbehave in subtler ways that don't produce obvious error messages.

This section helps you build a mental model for monitoring AI workloads so you know where to look when things don't seem right.

How Agent Health Differs from Traditional App Health

A traditional app either works or it doesn't. An AI agent can appear to work but produce poor results. Think of agent health in two layers:

Layer What to Watch Where to Look
Infrastructure health Is the service running? Are resources provisioned? Are endpoints reachable? azd monitor, Azure Portal resource health, container/app logs
Behavior health Is the agent responding accurately? Are responses timely? Is the model being called correctly? Application Insights traces, model call latency metrics, response quality logs

Infrastructure health is familiar—it's the same for any azd app. Behavior health is the new layer that AI workloads introduce.

Where to Look When AI Apps Don't Behave as Expected

If your AI application isn't producing the results you expect, here's a conceptual checklist:

  1. Start with the basics. Is the app running? Can it reach its dependencies? Check azd monitor and resource health just as you would for any app.
  2. Check the model connection. Is your application successfully calling the AI model? Failed or timed-out model calls are the most common cause of AI app issues and will show up in your application logs.
  3. Look at what the model received. AI responses depend on the input (the prompt and any retrieved context). If the output is wrong, the input is usually wrong. Check whether your application is sending the right data to the model.
  4. Review response latency. AI model calls are slower than typical API calls. If your app feels sluggish, check whether model response times have increased—this can indicate throttling, capacity limits, or region-level congestion.
  5. Watch for cost signals. Unexpected spikes in token usage or API calls can indicate a loop, a misconfigured prompt, or excessive retries.

You don't need to master observability tooling right away. The key takeaway is that AI applications have an extra layer of behavior to monitor, and azd's built-in monitoring (azd monitor) gives you a starting point for investigating both layers.


Security Best Practices

1. Zero-Trust Security Model

Implementation Strategy:

  • No service-to-service communication without authentication
  • All API calls use managed identities
  • Network isolation with private endpoints
  • Least privilege access controls
// Managed Identity for each service
resource chatServiceIdentity 'Microsoft.ManagedIdentity/userAssignedIdentities@2023-01-31' = {
  name: 'chat-service-identity'
  location: location
}

// Role assignments with minimal permissions
resource openAIUserRole 'Microsoft.Authorization/roleAssignments@2022-04-01' = {
  scope: openAIAccount
  name: guid(openAIAccount.id, chatServiceIdentity.id, openAIUserRoleDefinitionId)
  properties: {
    roleDefinitionId: subscriptionResourceId('Microsoft.Authorization/roleDefinitions', '5e0bd9bd-7b93-4f28-af87-19fc36ad61bd')
    principalId: chatServiceIdentity.properties.principalId
    principalType: 'ServicePrincipal'
  }
}

2. Secure Secret Management

Key Vault Integration Pattern:

// Key Vault with proper access policies
resource keyVault 'Microsoft.KeyVault/vaults@2023-02-01' = {
  name: keyVaultName
  location: location
  properties: {
    tenantId: tenant().tenantId
    sku: {
      family: 'A'
      name: 'premium'  // Use premium for production
    }
    enableRbacAuthorization: true  // Use RBAC instead of access policies
    enablePurgeProtection: true    // Prevent accidental deletion
    enableSoftDelete: true
    softDeleteRetentionInDays: 90
  }
}

// Store all AI service credentials
resource openAIKeySecret 'Microsoft.KeyVault/vaults/secrets@2023-02-01' = {
  parent: keyVault
  name: 'openai-api-key'
  properties: {
    value: openAIAccount.listKeys().key1
    attributes: {
      enabled: true
    }
  }
}

3. Network Security

Private Endpoint Configuration:

// Virtual Network for AI services
resource virtualNetwork 'Microsoft.Network/virtualNetworks@2023-04-01' = {
  name: vnetName
  location: location
  properties: {
    addressSpace: {
      addressPrefixes: ['10.0.0.0/16']
    }
    subnets: [
      {
        name: 'ai-services-subnet'
        properties: {
          addressPrefix: '10.0.1.0/24'
          privateEndpointNetworkPolicies: 'Disabled'
        }
      }
      {
        name: 'app-services-subnet'
        properties: {
          addressPrefix: '10.0.2.0/24'
          delegations: [
            {
              name: 'Microsoft.Web/serverFarms'
              properties: {
                serviceName: 'Microsoft.Web/serverFarms'
              }
            }
          ]
        }
      }
    ]
  }
}

// Private endpoints for all AI services
resource openAIPrivateEndpoint 'Microsoft.Network/privateEndpoints@2023-04-01' = {
  name: '${openAIAccountName}-pe'
  location: location
  properties: {
    subnet: {
      id: virtualNetwork.properties.subnets[0].id
    }
    privateLinkServiceConnections: [
      {
        name: 'openai-connection'
        properties: {
          privateLinkServiceId: openAIAccount.id
          groupIds: ['account']
        }
      }
    ]
  }
}

Performance and Scaling

1. Auto-Scaling Strategies

Container Apps Auto-scaling:

resource containerApp 'Microsoft.App/containerApps@2023-05-01' = {
  name: containerAppName
  location: location
  properties: {
    configuration: {
      ingress: {
        external: true
        targetPort: 8000
        transport: 'http'
      }
    }
    template: {
      scale: {
        minReplicas: 2  // Always have 2 instances minimum
        maxReplicas: 50 // Scale up to 50 for high load
        rules: [
          {
            name: 'http-scaling'
            http: {
              metadata: {
                concurrentRequests: '20'  // Scale when >20 concurrent requests
              }
            }
          }
          {
            name: 'cpu-scaling'
            custom: {
              type: 'cpu'
              metadata: {
                type: 'Utilization'
                value: '70'  // Scale when CPU >70%
              }
            }
          }
        ]
      }
    }
  }
}

2. Caching Strategies

Redis Cache for AI Responses:

// Redis Premium for production workloads
resource redisCache 'Microsoft.Cache/redis@2023-04-01' = {
  name: redisCacheName
  location: location
  properties: {
    sku: {
      name: 'Premium'
      family: 'P'
      capacity: 1
    }
    enableNonSslPort: false
    minimumTlsVersion: '1.2'
    redisConfiguration: {
      'maxmemory-policy': 'allkeys-lru'
    }
    // Enable clustering for high availability
    redisVersion: '6.0'
    shardCount: 2
  }
}

// Cache configuration in application
var cacheConnectionString = '${redisCache.properties.hostName}:6380,password=${redisCache.listKeys().primaryKey},ssl=True,abortConnect=False'

3. Load Balancing and Traffic Management

Application Gateway with WAF:

// Application Gateway with Web Application Firewall
resource applicationGateway 'Microsoft.Network/applicationGateways@2023-04-01' = {
  name: appGatewayName
  location: location
  properties: {
    sku: {
      name: 'WAF_v2'
      tier: 'WAF_v2'
      capacity: 2
    }
    webApplicationFirewallConfiguration: {
      enabled: true
      firewallMode: 'Prevention'
      ruleSetType: 'OWASP'
      ruleSetVersion: '3.2'
    }
    // Backend pools for AI services
    backendAddressPools: [
      {
        name: 'ai-services-pool'
        properties: {
          backendAddresses: [
            {
              fqdn: '${containerApp.properties.configuration.ingress.fqdn}'
            }
          ]
        }
      }
    ]
  }
}

💰 Cost Optimization

1. Resource Right-Sizing

Environment-Specific Configurations:

# Development environment
azd env new development
azd env set AZURE_OPENAI_SKU "S0"
azd env set AZURE_OPENAI_CAPACITY 10
azd env set AZURE_SEARCH_SKU "basic"
azd env set CONTAINER_CPU 0.5
azd env set CONTAINER_MEMORY 1.0

# Production environment  
azd env new production
azd env set AZURE_OPENAI_SKU "S0"
azd env set AZURE_OPENAI_CAPACITY 100
azd env set AZURE_SEARCH_SKU "standard"
azd env set CONTAINER_CPU 2.0
azd env set CONTAINER_MEMORY 4.0

2. Cost Monitoring and Budgets

// Cost management and budgets
resource budget 'Microsoft.Consumption/budgets@2023-05-01' = {
  name: 'ai-workload-budget'
  properties: {
    timePeriod: {
      startDate: '2024-01-01'
      endDate: '2024-12-31'
    }
    timeGrain: 'Monthly'
    amount: 2000  // $2000 monthly budget
    category: 'Cost'
    notifications: {
      warning: {
        enabled: true
        operator: 'GreaterThan'
        threshold: 80
        contactEmails: [
          'finance@company.com'
          'engineering@company.com'
        ]
        contactRoles: [
          'Owner'
          'Contributor'
        ]
      }
      critical: {
        enabled: true
        operator: 'GreaterThan'
        threshold: 95
        contactEmails: [
          'cto@company.com'
        ]
      }
    }
  }
}

3. Token Usage Optimization

OpenAI Cost Management:

// Application-level token optimization
class TokenOptimizer {
  private readonly maxTokens = 4000;
  private readonly reserveTokens = 500;
  
  optimizePrompt(userInput: string, context: string): string {
    const availableTokens = this.maxTokens - this.reserveTokens;
    const estimatedTokens = this.estimateTokens(userInput + context);
    
    if (estimatedTokens > availableTokens) {
      // Truncate context, not user input
      context = this.truncateContext(context, availableTokens - this.estimateTokens(userInput));
    }
    
    return `${context}\n\nUser: ${userInput}`;
  }
  
  private estimateTokens(text: string): number {
    // Rough estimation: 1 token ≈ 4 characters
    return Math.ceil(text.length / 4);
  }
}

Monitoring and Observability

1. Comprehensive Application Insights

// Application Insights with advanced features
resource applicationInsights 'Microsoft.Insights/components@2020-02-02' = {
  name: applicationInsightsName
  location: location
  kind: 'web'
  properties: {
    Application_Type: 'web'
    WorkspaceResourceId: logAnalyticsWorkspace.id
    SamplingPercentage: 100  // Full sampling for AI apps
    DisableIpMasking: false  // Enable for security
  }
}

// Custom metrics for AI operations
resource aiMetricAlerts 'Microsoft.Insights/metricAlerts@2018-03-01' = {
  name: 'ai-high-error-rate'
  location: 'global'
  properties: {
    description: 'Alert when AI service error rate is high'
    severity: 2
    enabled: true
    scopes: [
      applicationInsights.id
    ]
    evaluationFrequency: 'PT1M'
    windowSize: 'PT5M'
    criteria: {
      'odata.type': 'Microsoft.Azure.Monitor.SingleResourceMultipleMetricCriteria'
      allOf: [
        {
          name: 'high-error-rate'
          metricName: 'requests/failed'
          operator: 'GreaterThan'
          threshold: 10
          timeAggregation: 'Count'
        }
      ]
    }
  }
}

2. AI-Specific Monitoring

Custom Dashboards for AI Metrics:

// Dashboard configuration for AI workloads
{
  "dashboard": {
    "name": "AI Application Monitoring",
    "tiles": [
      {
        "name": "OpenAI Request Volume",
        "query": "requests | where name contains 'openai' | summarize count() by bin(timestamp, 5m)"
      },
      {
        "name": "AI Response Latency",
        "query": "requests | where name contains 'openai' | summarize avg(duration) by bin(timestamp, 5m)"
      },
      {
        "name": "Token Usage",
        "query": "customMetrics | where name == 'openai_tokens_used' | summarize sum(value) by bin(timestamp, 1h)"
      },
      {
        "name": "Cost per Hour",
        "query": "customMetrics | where name == 'openai_cost' | summarize sum(value) by bin(timestamp, 1h)"
      }
    ]
  }
}

3. Health Checks and Uptime Monitoring

// Application Insights availability tests
resource availabilityTest 'Microsoft.Insights/webtests@2022-06-15' = {
  name: 'ai-app-availability-test'
  location: location
  tags: {
    'hidden-link:${applicationInsights.id}': 'Resource'
  }
  properties: {
    SyntheticMonitorId: 'ai-app-availability-test'
    Name: 'AI Application Availability Test'
    Description: 'Tests AI application endpoints'
    Enabled: true
    Frequency: 300  // 5 minutes
    Timeout: 120    // 2 minutes
    Kind: 'ping'
    Locations: [
      {
        Id: 'us-east-2-azr'
      }
      {
        Id: 'us-west-2-azr'
      }
    ]
    Configuration: {
      WebTest: '''
        <WebTest Name="AI Health Check" 
                 Id="8d2de8d2-a2b0-4c2e-9a0d-8f9c9a0b8c8d" 
                 Enabled="True" 
                 CssProjectStructure="" 
                 CssIteration="" 
                 Timeout="120" 
                 WorkItemIds="" 
                 xmlns="http://microsoft.com/schemas/VisualStudio/TeamTest/2010" 
                 Description="" 
                 CredentialUserName="" 
                 CredentialPassword="" 
                 PreAuthenticate="True" 
                 Proxy="default" 
                 StopOnError="False" 
                 RecordedResultFile="" 
                 ResultsLocale="">
          <Items>
            <Request Method="GET" 
                     Guid="a5f10126-e4cd-570d-961c-cea43999a200" 
                     Version="1.1" 
                     Url="${webApp.properties.defaultHostName}/health" 
                     ThinkTime="0" 
                     Timeout="120" 
                     ParseDependentRequests="True" 
                     FollowRedirects="True" 
                     RecordResult="True" 
                     Cache="False" 
                     ResponseTimeGoal="0" 
                     Encoding="utf-8" 
                     ExpectedHttpStatusCode="200" 
                     ExpectedResponseUrl="" 
                     ReportingName="" 
                     IgnoreHttpStatusCode="False" />
          </Items>
        </WebTest>
      '''
    }
  }
}

Disaster Recovery and High Availability

1. Multi-Region Deployment

# azure.yaml - Multi-region configuration
name: ai-app-multiregion
services:
  api-primary:
    project: ./api
    host: containerapp
    env:
      - AZURE_REGION=eastus
  api-secondary:
    project: ./api
    host: containerapp
    env:
      - AZURE_REGION=westus2
// Traffic Manager for global load balancing
resource trafficManager 'Microsoft.Network/trafficManagerProfiles@2022-04-01' = {
  name: trafficManagerProfileName
  location: 'global'
  properties: {
    profileStatus: 'Enabled'
    trafficRoutingMethod: 'Priority'
    dnsConfig: {
      relativeName: trafficManagerProfileName
      ttl: 30
    }
    monitorConfig: {
      protocol: 'HTTPS'
      port: 443
      path: '/health'
      intervalInSeconds: 30
      toleratedNumberOfFailures: 3
      timeoutInSeconds: 10
    }
    endpoints: [
      {
        name: 'primary-endpoint'
        type: 'Microsoft.Network/trafficManagerProfiles/azureEndpoints'
        properties: {
          targetResourceId: primaryAppService.id
          endpointStatus: 'Enabled'
          priority: 1
        }
      }
      {
        name: 'secondary-endpoint'
        type: 'Microsoft.Network/trafficManagerProfiles/azureEndpoints'
        properties: {
          targetResourceId: secondaryAppService.id
          endpointStatus: 'Enabled'
          priority: 2
        }
      }
    ]
  }
}

2. Data Backup and Recovery

// Backup configuration for critical data
resource backupVault 'Microsoft.DataProtection/backupVaults@2023-05-01' = {
  name: backupVaultName
  location: location
  identity: {
    type: 'SystemAssigned'
  }
  properties: {
    storageSettings: [
      {
        datastoreType: 'VaultStore'
        type: 'LocallyRedundant'
      }
    ]
  }
}

// Backup policy for AI models and data
resource backupPolicy 'Microsoft.DataProtection/backupVaults/backupPolicies@2023-05-01' = {
  parent: backupVault
  name: 'ai-data-backup-policy'
  properties: {
    policyRules: [
      {
        backupParameters: {
          backupType: 'Full'
          objectType: 'AzureBackupParams'
        }
        trigger: {
          schedule: {
            repeatingTimeIntervals: [
              'R/2024-01-01T02:00:00+00:00/P1D'  // Daily at 2 AM
            ]
          }
          objectType: 'ScheduleBasedTriggerContext'
        }
        dataStore: {
          datastoreType: 'VaultStore'
          objectType: 'DataStoreInfoBase'
        }
        name: 'BackupDaily'
        objectType: 'AzureBackupRule'
      }
    ]
  }
}

DevOps and CI/CD Integration

1. GitHub Actions Workflow

# .github/workflows/deploy-ai-app.yml
name: Deploy AI Application

on:
  push:
    branches: [main]
  pull_request:
    branches: [main]

jobs:
  test:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup Python
        uses: actions/setup-python@v4
        with:
          python-version: '3.11'
          
      - name: Install dependencies
        run: |
          pip install -r requirements.txt
          pip install pytest
          
      - name: Run tests
        run: pytest tests/
        
      - name: AI Safety Tests
        run: |
          python scripts/test_ai_safety.py
          python scripts/validate_prompts.py

  deploy-staging:
    needs: test
    if: github.event_name == 'pull_request'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup AZD
        uses: Azure/setup-azd@v2
        
      - name: Login to Azure
        uses: azure/login@v1
        with:
          creds: ${{ secrets.AZURE_CREDENTIALS }}
          
      - name: Deploy to Staging
        run: |
          azd env select staging
          azd deploy

  deploy-production:
    needs: test
    if: github.ref == 'refs/heads/main'
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      
      - name: Setup AZD
        uses: Azure/setup-azd@v2
        
      - name: Login to Azure
        uses: azure/login@v1
        with:
          creds: ${{ secrets.AZURE_CREDENTIALS }}
          
      - name: Deploy to Production
        run: |
          azd env select production
          azd deploy
          
      - name: Run Production Health Checks
        run: |
          python scripts/health_check.py --env production

2. Infrastructure Validation

# scripts/validate_infrastructure.sh
#!/bin/bash

echo "Validating AI infrastructure deployment..."

# Check if all required services are running
services=("openai" "search" "storage" "keyvault")
for service in "${services[@]}"; do
    echo "Checking $service..."
    if ! az resource list --resource-type "Microsoft.CognitiveServices/accounts" --query "[?contains(name, '$service')]" -o tsv; then
        echo "ERROR: $service not found"
        exit 1
    fi
done

# Validate OpenAI model deployments
echo "Validating OpenAI model deployments..."
models=$(az cognitiveservices account deployment list --name $AZURE_OPENAI_NAME --resource-group $AZURE_RESOURCE_GROUP --query "[].name" -o tsv)
if [[ ! $models == *"gpt-4.1-mini"* ]]; then
  echo "ERROR: Required model gpt-4.1-mini not deployed"
    exit 1
fi

# Test AI service connectivity
echo "Testing AI service connectivity..."
python scripts/test_connectivity.py

echo "Infrastructure validation completed successfully!"

Production Readiness Checklist

Security ✅

  • All services use managed identities
  • Secrets stored in Key Vault
  • Private endpoints configured
  • Network security groups implemented
  • RBAC with least privilege
  • WAF enabled on public endpoints

Performance ✅

  • Auto-scaling configured
  • Caching implemented
  • Load balancing setup
  • CDN for static content
  • Database connection pooling
  • Token usage optimization

Monitoring ✅

  • Application Insights configured
  • Custom metrics defined
  • Alerting rules setup
  • Dashboard created
  • Health checks implemented
  • Log retention policies

Reliability ✅

  • Multi-region deployment
  • Backup and recovery plan
  • Circuit breakers implemented
  • Retry policies configured
  • Graceful degradation
  • Health check endpoints

Cost Management ✅

  • Budget alerts configured
  • Resource right-sizing
  • Dev/test discounts applied
  • Reserved instances purchased
  • Cost monitoring dashboard
  • Regular cost reviews

Compliance ✅

  • Data residency requirements met
  • Audit logging enabled
  • Compliance policies applied
  • Security baselines implemented
  • Regular security assessments
  • Incident response plan

Performance Benchmarks

Typical Production Metrics

Metric Target Monitoring
Response Time < 2 seconds Application Insights
Availability 99.9% Uptime monitoring
Error Rate < 0.1% Application logs
Token Usage < $500/month Cost management
Concurrent Users 1000+ Load testing
Recovery Time < 1 hour Disaster recovery tests

Load Testing

# Load testing script for AI applications
python scripts/load_test.py \
  --endpoint https://your-ai-app.azurewebsites.net \
  --concurrent-users 100 \
  --duration 300 \
  --ramp-up 60

🤝 Community Best Practices

Based on Microsoft Foundry Discord community feedback:

Top Recommendations from the Community:

  1. Start Small, Scale Gradually: Begin with basic SKUs and scale up based on actual usage
  2. Monitor Everything: Set up comprehensive monitoring from day one
  3. Automate Security: Use infrastructure as code for consistent security
  4. Test Thoroughly: Include AI-specific testing in your pipeline
  5. Plan for Costs: Monitor token usage and set budget alerts early

Common Pitfalls to Avoid:

  • ❌ Hardcoding API keys in code
  • ❌ Not setting up proper monitoring
  • ❌ Ignoring cost optimization
  • ❌ Not testing failure scenarios
  • ❌ Deploying without health checks

AZD AI CLI Commands and Extensions

AZD includes a growing set of AI-specific commands and extensions that streamline production AI workflows. These tools bridge the gap between local development and production deployment for AI workloads.

AZD Extensions for AI

AZD uses an extension system to add AI-specific capabilities. Install and manage extensions with:

# List all available extensions (including AI)
azd extension list

# Inspect installed extension details
azd extension show azure.ai.agents

# Install the Foundry agents extension
azd extension install azure.ai.agents

# Install the fine-tuning extension
azd extension install azure.ai.finetune

# Install the custom models extension
azd extension install azure.ai.models

# Upgrade all installed extensions
azd extension upgrade --all

Available AI extensions:

Extension Purpose Status
azure.ai.agents Foundry Agent Service management Preview
azure.ai.finetune Foundry model fine-tuning Preview
azure.ai.models Foundry custom models Preview
azure.coding-agent Coding agent configuration Available

Initializing Agent Projects with azd ai agent init

The azd ai agent init command scaffolds a production-ready AI agent project integrated with Microsoft Foundry Agent Service:

# Initialize a new agent project from an agent manifest
azd ai agent init -m <manifest-path-or-uri>

# Initialize and target a specific Foundry project
azd ai agent init -m agent-manifest.yaml --project-id <foundry-project-id>

# Initialize with a custom source directory
azd ai agent init -m agent-manifest.yaml --src ./agents/my-agent

# Target Container Apps as the host
azd ai agent init -m agent-manifest.yaml --host containerapp

Key flags:

Flag Description
-m, --manifest Path or URI to an agent manifest to add to your project
-p, --project-id Existing Microsoft Foundry Project ID for your azd environment
-s, --src Directory to download the agent definition (defaults to src/<agent-id>)
--host Override the default host (e.g., containerapp)
-e, --environment The azd environment to use

Production tip: Use --project-id to connect directly to an existing Foundry project, keeping your agent code and cloud resources linked from the start.

Model Context Protocol (MCP) with azd mcp

AZD includes built-in MCP server support (Alpha), enabling AI agents and tools to interact with your Azure resources through a standardized protocol:

# Start the MCP server for your project
azd mcp start

# Review current Copilot consent rules for tool execution
azd copilot consent list

The MCP server exposes your azd project context—environments, services, and Azure resources—to AI-powered development tools. This enables:

  • AI-assisted deployment: Let coding agents query your project state and trigger deployments
  • Resource discovery: AI tools can discover what Azure resources your project uses
  • Environment management: Agents can switch between dev/staging/production environments

Infrastructure Generation with azd infra generate

For production AI workloads, you can generate and customize Infrastructure as Code rather than relying on automatic provisioning:

# Generate Bicep/Terraform files from your project definition
azd infra generate

This writes IaC to disk so you can:

  • Review and audit infrastructure before deploying
  • Add custom security policies (network rules, private endpoints)
  • Integrate with existing IaC review processes
  • Version control infrastructure changes separately from application code

Production Lifecycle Hooks

AZD hooks let you inject custom logic at every stage of the deployment lifecycle—critical for production AI workflows:

# azure.yaml - Production hooks example
name: ai-production-app
hooks:
  preprovision:
    shell: sh
    run: scripts/validate-quotas.sh    # Check AI model quota before provisioning
  postprovision:
    shell: sh
    run: scripts/configure-networking.sh  # Set up private endpoints
  predeploy:
    shell: sh
    run: scripts/run-ai-safety-tests.sh  # Run prompt safety checks
  postdeploy:
    shell: sh
    run: scripts/smoke-test.sh           # Verify agent responses post-deploy
services:
  agent-api:
    project: ./src/agent
    host: containerapp
    hooks:
      predeploy:
        shell: sh
        run: scripts/validate-model-access.sh  # Per-service hook
# Run a specific hook manually during development
azd hooks run predeploy

Recommended production hooks for AI workloads:

Hook Use Case
preprovision Validate subscription quotas for AI model capacity
postprovision Configure private endpoints, deploy model weights
predeploy Run AI safety tests, validate prompt templates
postdeploy Smoke test agent responses, verify model connectivity

CI/CD Pipeline Configuration

Use azd pipeline config to connect your project to GitHub Actions or Azure Pipelines with secure Azure authentication:

# Configure CI/CD pipeline (interactive)
azd pipeline config

# Configure with a specific provider
azd pipeline config --provider github

This command:

  • Creates a service principal with least-privilege access
  • Configures federated credentials (no stored secrets)
  • Generates or updates your pipeline definition file
  • Sets required environment variables in your CI/CD system

Production workflow with pipeline config:

# 1. Set up production environment
azd env new production
azd env set AZURE_OPENAI_CAPACITY 100

# 2. Configure the pipeline
azd pipeline config --provider github

# 3. Pipeline runs azd deploy on every push to main

Adding Components with azd add

Incrementally add Azure services to an existing project:

# Add a new service component interactively
azd add

This is particularly useful for expanding production AI applications—for example, adding a vector search service, a new agent endpoint, or a monitoring component to an existing deployment.

Additional Resources


Chapter Navigation:

Remember: Production AI workloads require careful planning, monitoring, and continuous optimization. Start with these patterns and adapt them to your specific requirements.