Scalable agent runtime demonstrating system design, resilience patterns, and production best practices.
- Circuit Breaker Pattern: Prevents cascading failures with automatic recovery
- Exponential Backoff Retry: Resilient operations with configurable retry logic
- Rate Limiting: Token bucket algorithm for API throttling (500 req/min)
- Caching Layer: In-memory cache with TTL for performance optimization
- Distributed Tracing: Correlation IDs and structured logging for observability
- Graceful Shutdown: Proper resource cleanup and signal handling
- Comprehensive Testing: 95%+ test coverage with pytest
- Performance Benchmarking: Automated load testing suite
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ FastAPI + Middleware Layer โ
โ (Correlation ID โ Logging โ Rate Limiting) โ
โโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโผโโโโโโโโโ
โ Orchestrator โ
โ (Async Routing)โ
โโโโโโโโโโฌโโโโโโโโโ
โ
โโโโโโโโโโดโโโโโโโโโโโโโโโโโโโโ
โ โ
โโโโโผโโโโโโโโโ โโโโโโโโโผโโโโโโโโโ
โ LangGraph โ โ CrewAI โ
โ Workflow โ โ Multi-Agent โ
โ โ โ (Parallel) โ
โโโโโฌโโโโโโโโโ โโโโโโโโโฌโโโโโโโโโ
โ โ
โโโโโโโโโโโฌโโโโโโโโโโโโโโโโโโ
โ
โโโโโโโโโโโผโโโโโโโโโโโ
โ Resilience Layer โ
โ - Circuit Breaker โ
โ - Retry Logic โ
โ - Rate Limiter โ
โ - Cache Manager โ
โโโโโโโโโโโฌโโโโโโโโโโโ
โ
โโโโโโโโโโโผโโโโโโโโโโโ
โ Memory Manager โ
โ (Stateful Store) โ
โโโโโโโโโโโโโโโโโโโโโโ
# Install dependencies
pip install -r requirements.txt
# Run tests
pytest tests/ -v --cov=src
# Start server
python main.py
# Run benchmarks (in another terminal)
python benchmark.py
# Run demo
python examples/demo.pyR/
โโโ src/ # Core modules
โ โโโ agents.py # Agent orchestration with retry/circuit breaker
โ โโโ memory.py # Stateful memory management
โ โโโ inference.py # Inference layer
โ โโโ telemetry.py # Metrics and observability
โ โโโ resilience.py # Circuit breaker, retry, rate limit, cache
โ โโโ middleware.py # Logging, correlation IDs, rate limiting
โ โโโ config.py # Configuration management
โโโ tests/ # Comprehensive test suite
โ โโโ test_resilience.py # Resilience pattern tests
โ โโโ test_agents.py # Agent functionality tests
โ โโโ test_memory.py # Memory management tests
โ โโโ conftest.py # Pytest fixtures
โโโ examples/
โ โโโ demo.py # Interactive demo
โโโ main.py # FastAPI server with middleware
โโโ benchmark.py # Performance benchmarking suite
โโโ requirements.txt # Minimal dependencies
- Circuit Breaker: Auto-recovery from failures (3 failures โ open, 60s recovery)
- Retry with Exponential Backoff: Configurable retry logic with jitter
- Graceful Degradation: Fail-safe mechanisms throughout
- Rate Limiting: Token bucket algorithm (500 req/min) prevents overload
- Caching: In-memory cache with TTL reduces redundant computation
- Async/Parallel Execution: CrewAI agents run concurrently
- Connection Pooling: Efficient resource utilization
- Structured Logging: JSON logs with contextual information
- Correlation IDs: End-to-end request tracing
- Metrics Collection: Real-time performance metrics
- Health Checks: Liveness and readiness probes
- Graceful Shutdown: SIGTERM/SIGINT handling with cleanup
- Error Handling: Comprehensive exception handling with context
- Configuration Management: Environment-based settings
- API Versioning: Semantic versioning support
- Unit Tests: 95%+ code coverage
- Integration Tests: End-to-end testing
- Performance Benchmarks: Automated load testing
- Test Fixtures: Reusable test components
| Endpoint | Method | Description | Features |
|---|---|---|---|
/api/agent/execute |
POST | Execute agent task | Retry, Circuit Breaker, Caching |
/api/memory/store |
POST | Store in memory | TTL support |
/api/memory/retrieve |
POST | Retrieve from memory | Pagination |
/api/metrics |
GET | System metrics | Real-time stats |
/health |
GET | Health check | Readiness probe |
/docs |
GET | OpenAPI docs | Interactive API |
# Run all tests with coverage
pytest tests/ -v --cov=src --cov-report=html
# Run specific test suite
pytest tests/test_resilience.py -v
# Run with markers
pytest tests/ -v -m asyncio- Resilience Patterns: Circuit breaker, retry, rate limiting
- Agent Functionality: LangGraph, CrewAI orchestration
- Memory Operations: Short-term, long-term, concurrent access
- Error Scenarios: Failure handling, recovery
Benchmark results on M1 Mac (8GB RAM):
| Operation | Avg Latency | P95 | P99 | Throughput |
|---|---|---|---|---|
| Health Check | 2.88ms | 6.27ms | 7.80ms | 347 req/s |
| Agent Execute | 56.34ms | 61.01ms | 75.34ms | 17.75 req/s |
| Memory Store | 3.51ms | 7.07ms | 12.23ms | 284 req/s |
| Concurrent (10) | 124.33ms | - | - | 48.52 req/s |
- Rate Limiting: Prevents DoS attacks (500 req/min default)
- Input Validation: Pydantic models for request validation
- Error Sanitization: No sensitive data in error responses
- Correlation IDs: Audit trail for all requests
Structured logs include:
- Request/response timing
- Error rates and types
- Memory usage statistics
- Cache hit/miss rates
- Circuit breaker state changes
# src/config.py
class Settings:
host: str = "0.0.0.0"
port: int = 8000
max_concurrent_agents: int = 10
agent_timeout: int = 300Prevents cascading failures when inference layer is slow/down. Automatically recovers without manual intervention.
Smooth traffic distribution vs hard limits. Allows burst traffic while maintaining average rate.
Essential for distributed tracing. Links all operations in a request chain for debugging.
Machine-parseable logs enable better alerting and analytics. Critical for production systems.
MIT License - Feel free to use for portfolio/learning