Building production-grade distributed systems with automated AWS deployments, achieving sub-1ms response times at 1K+ concurrent users
๐ Actively seeking: Remote backend engineering positions
๐ Specialization: Python backend + DevOps automation + Distributed systems
๐ Location: Dhaka, Bangladesh (Open to worldwide remote)
๐ฌ Ask me about: FastAPI, System Design, AWS Infrastructure
I don't just write backend codeโI architect complete production systems with full automation from infrastructure to deployment:
โ Infrastructure as Code Expert - Automated AWS deployments managing 11+ EC2 instances with Pulumi & Ansible โ Performance Engineering - Optimized systems achieving sub-1ms response times with 1K+ concurrent users โ Polyglot Architecture - Go for performance-critical paths, Python for business logic โ DevOps Automation - Zero-touch deployments with CI/CD, containerization, and orchestration โ Distributed Systems - Built fault-tolerant architectures with auto-scaling, load balancing, and high availability โ Technical Writing - Published articles explaining complex architectures in simple words
๐ฏ 6+ Production-Ready Applications Built
โก Sub-1ms API Response Times Achieved
๐ฅ๏ธ 11+ AWS EC2 Instances Automated
๐ฆ 1K+ Concurrent Users Supported
๐ Container Orchestration Systems Designed
๐งช 500+ DSA Problems Solved
๐ 200K+ Technical Blog Readers
๐ฅ 40+ Educational Videos Created
๐ Scalable URL Shortener Microservice ๐ฅ
High Complexity - Polyglot Microservices with Full Observability
Production-grade URL shortener with Go redirect service achieving sub-1ms latency, complete OpenTelemetry observability stack, and intelligent autoscaling
- Architected polyglot microservices with Python FastAPI for
create_service, high-performance Go forredirect_service, and Celeryworker_service, each independently scalable viadocker-compose-decoupled.yml. - Implemented Go redirect service with Chi router achieving sub-1ms redirect latency (vs 5-7ms Python), clean architecture using
internal/package structure, and single-binary deployment for minimal resource footprint. - Built circuit breaker pattern in Go with three-state fault tolerance (Closed โ Open โ Half-Open), preventing cascade failures and enabling fast-fail for read operations without retry delays.
- Implemented cache-aside pattern with Redis (30-minute TTL) + MongoDB fallback, optimizing for 95%+ cache hit rate and automatic expiration handling.
- Deployed complete observability stack with
OpenTelemetrycollector,Tempo(distributed tracing),Loki(log aggregation via Promtail), andGrafanadashboards for end-to-end service visibility. - Engineered production-grade resilience with
PgBouncerconnection pooling (53% reduction in overhead), atomic PostgreSQL key acquisition usingSELECT FOR UPDATE SKIP LOCKED, and exponential backoff retries. - Implemented intelligent key pre-population using
Celeryworkers maintaining pool of unused keys for instant URL creation without database latency, with hybrid strategy auto-selecting optimal insertion method. - Built comprehensive testing infrastructure with multi-database mocking (SQLite, mongomock, fakeredis), async pytest framework, httpx API client testing, and isolated test environments.
- Deployed production K3s cluster on AWS using
PulumiIaC andAnsiblewith path-based Nginx routing, per-service rate limiting, and CI/CD pipeline viaGitHub Actions.
Technical Deep Dive: Read my Medium articles
Tech Stack: Go FastAPI Chi Router Redis PostgreSQL MongoDB Celery Nginx Docker K3s Pulumi Ansible AWS OpenTelemetry Tempo Loki Grafana Promtail PgBouncer Circuit Breaker pytest httpx GitHub Actions
Key Learnings:
- Polyglot microservices: Go for performance-critical paths, Python for business logic
- Clean architecture with internal/ package structure in Go
- Circuit breaker pattern for fault tolerance in distributed systems
- End-to-end observability with OpenTelemetry + Tempo + Loki
- Multi-database testing strategies with mocking frameworks
๐ ElastiKube: Production K3s Autoscaler ๐ฅ
Most Complex Infrastructure Project - ML-Enhanced Event-Driven Architecture
Production-grade autoscaling system for K3s clusters with 4-layer intelligent scaling architecture, ML-based predictive scaling, and multi-AZ high availability
- Architected 4-layer autoscaling system: (1) Data Collection for ML training, (2) Time-Aware Scaling with peak/off-peak thresholds (85%/60% vs 60%/40%), (3) Flash Sale Detection with emergency response to CPU spikes >30% in 2 minutes, (4) Predictive Scaling using Prophet models forecasting CPU 15 minutes ahead.
- Implemented ML training pipeline with Kubernetes CronJob for automated weekly model retraining, feature engineering (temporal cyclical encoding, lag features, rolling statistics), time-series cross-validation, and backtesting with MAE/RMSE metrics tracking.
- Built event-driven Lambda architecture with four specialized functions (Decision, Scale-Up, Scale-Down, Cleanup) orchestrated through EventBridge for fault tolerance, crash recovery via Write-Ahead Log (WAL), and distributed locking with 200s timeout.
- Designed multi-AZ high availability with round-robin worker distribution across 3 availability zones (ap-southeast-1a/b/c), single NAT Gateway optimization, and LIFO scale-down maintaining natural distribution balance.
- Implemented multi-layer idempotency including bootstrap verification, cooldown checks (scale-up: 300s, scale-down: 900s), pending instance detection, and automatic stale flag cleanup to prevent duplicate scaling operations.
- Integrated comprehensive observability with 17 CloudWatch alarms (CRITICAL/WARNING severity), Prometheus health graceful degradation (conservative defaults when unavailable), and fixed LogGroups for stable dashboard references.
- Engineered spot instance support with automatic On-Demand fallback when spot capacity unavailable (InsufficientInstanceCapacity, SpotInstanceCapacityNotAvailable, MaxSpotInstanceCountExceeded), graceful 2-minute interruption handling, and proper node cleanup.
Tech Stack: AWS Lambda EventBridge DynamoDB EC2 K3s Prometheus CloudWatch Prophet Kubernetes CronJob SSM Secrets Manager S3 Python 3.11 Pulumi Ansible kubectl Node Exporter
Key Learnings:
- Layered autoscaling architecture combining reactive (time-aware, flash sale) and proactive (ML predictive) scaling
- Event-driven architecture patterns with Lambda chaining via EventBridge
- Distributed systems state management with DynamoDB and WAL patterns
- ML pipeline deployment with automated retraining and model versioning
- Multi-AZ infrastructure design with cost optimization (single NAT, spot instances)
- Kubernetes cluster operations including node lifecycle, pod draining, and CronJob scheduling
High Complexity - Media Processing Pipeline
Full-Stack advanced video streaming solution with adaptive bitrate technology
- Engineered a secure and scalable video platform with a
Django REST APIand aReact/TypeScriptfrontend, architected for high-performance adaptive streaming. - Implemented a robust security model, using
dj-rest-authfor token-based authentication and a protected media workflow (via NginxX-Accel-Redirect) to ensure only authorized users can access streaming content. - Built an asynchronous video processing pipeline using
Celery,Redis, andFFMPEGto transcode videos forDASHplayback, ensuring a smooth, low-latency user experience. - Automated the entire cloud workflow, from provisioning
AWS S3infrastructure withPulumiand configuring servers withAnsible, to deploying theDocker-containerized application viaGitHub Actions.
Tech Stack: Django React Celery Redis PostgreSQL FFMPEG DASH AWS S3 Nginx Docker Pulumi Ansible
โก Distributed Job Queue System ๐ฅ
Medium-High Complexity - Worker Orchestration
Scalable job processing system with advanced features
- Developed a distributed job queue system using
FastAPIandRedisto manage asynchronous tasks with priority-based queuing and automatic worker scaling. - Implemented a real-time monitoring dashboard with
Jinja2templates to provide visibility into job status, queue metrics, and worker activity. - Engineered an automatic worker scaling mechanism based on job load and worker availability, using
Docker Swarmto dynamically adjust resources. - Created a comprehensive error handling and fault tolerance system, including automatic retries for failed jobs and a dead-letter queue for unrecoverable tasks.
- Designed a job dependency feature to ensure complex workflows are executed in the correct order, improving system reliability.
- Containerized all services (
API,Worker,Monitor) usingDockerfor consistent deployment and simplified management.
Tech Stack: FastAPI Redis Docker Swarm Jinja2
Medium Complexity - Full-Stack Application
Full-stack financial management application for tracking installments and payments
- Backend: High-performance API built with
FastAPI, usingSQLAlchemyfor ORM with aPostgreSQLdatabase. - Frontend: Modern and responsive UI built with
React,TypeScript, andVite, styled withTailwind CSSandShadcn UI. - Asynchronous Tasks:
CeleryandRedismanage background jobs like sending OTP and due date notification emails. - Authentication: Secure JWT-based authentication with role-based access for customers and admins.
- Data Management:
Alembichandles database schema migrations, andTanStack Querymanages server state on the frontend. - DevOps: Fully containerized with
DockerandDocker Composefor reproducible development and deployment environments.
Tech Stack: FastAPI React TypeScript PostgreSQL SQLAlchemy Redis Celery Docker Tailwind CSS Shadcn UI Alembic
Medium Complexity - Async Communication
Real-time notification system for multiple channels
- Modern Backend: Built with Python and FastAPI for high-performance, asynchronous API endpoints.
- Multi-Channel Delivery: Supports sending notifications through various channels like Email, SMS, and Push Notifications.
- Asynchronous & Scalable: Leverages Celery and RabbitMQ for background task processing, ensuring the system can handle high-volume loads without blocking.
- Robust Data Storage: Uses PostgreSQL for reliable data persistence, managed with Alembic for smooth database migrations.
- Containerized Environment: Fully containerized with Docker and Docker Compose for consistent development, testing, and deployment.
- Comprehensive Testing: Includes a full suite of tests using pytest to ensure code quality and reliability.
Tech Stack: FastAPI Celery PostgreSQL RabbitMQ Redis Alembic SQLAlchemy Docker Pytest
Medium Complexity - HA Architecture
Enterprise-grade Todo application with AWS infrastructure
- Engineered full-stack application with FastAPI backend and React frontend
- Implemented Infrastructure as Code using Pulumi for AWS resource management
- Designed fault-tolerant architecture with load balancing across multiple AZs
- Built PostgreSQL replication system with automated backup/recovery
- Integrated Redis Sentinel for high availability caching
Tech Stack: FastAPI React AWS EC2 PostgreSQL Redis Sentinel Nginx Docker
August 2024 - Present | Portfolio Projects
๐ฏ Building production systems to demonstrate platform engineering capabilities while actively seeking full-time opportunities
- Architected and deployed 5 production-grade applications serving 5,000+ real users across e-commerce, fintech, and SaaS domains
- Managed 11+ EC2 instances with 99.9% uptime through multi-AZ AWS infrastructure with automated deployment
- Built ElastiKube: ML-enhanced Kubernetes autoscaler achieving 60% cost reduction with 4-layer intelligent scaling (time-aware, flash sale detection, Prophet forecasting)
- Engineered polyglot URL shortener with Go redirect service achieving sub-1ms latency and comprehensive observability (OpenTelemetry, Tempo, Loki, Grafana)
- Automated infrastructure deployment with Pulumi & Ansible, reducing deployment time 93.75% (4 hours โ 15 minutes)
Tech Stack: Python, Go, FastAPI, Django, AWS, Kubernetes, Docker, PostgreSQL, Redis, MongoDB, Pulumi, Ansible, OpenTelemetry, Grafana
June 2024 - August 2024 | Dhaka, Bangladesh
๐ฏ Delivered measurable business impact:
- Designed role-based admin dashboard for 200+ users with real-time meal analytics
- Automated 40% of manual effort in account management through intelligent workflows
- Built production-ready meal scheduling system using cron jobs with configurable time boundaries
Tech Stack: Python, Django, PostgreSQL, Docker, JavaScript, HTML/CSS
Bachelor of Science in Computer Science & Engineering
Daffodil International University | September 2017 - December 2022
- Building a Scalable URL Shortener: System Design to Production
- Complete architectural breakdown with Infrastructure as Code
- 100+ views, featured in system design discussions
- 200,000+ readers on Quora with tech insights in Bengali
- Nearly 200 followers engaging with technology content
- 40+ instructional videos on YouTube bridging Bengali tech education gap
- 500+ Problems Solved across multiple platforms
- Active on: BeeCrowd, LightOJ, HackerRank, LeetCode
- Contest Achievements:
- DIU Take-Off Programming Contest (Ranked 6th out of 300 participants)
- Multiple university-level programming contest participations
- ๐ณ Kubernetes - Container orchestration at scale
I'm actively seeking opportunities to work on:
- ๐๏ธ Distributed systems requiring high availability and fault tolerance
- โ๏ธ Cloud-native applications with automated infrastructure
- ๐ Microservices architectures with proper observability
- ๐ Open-source projects where I can contribute infrastructure expertise
Looking for a backend engineer who can:
- โ Design scalable distributed systems
- โ Automate infrastructure from scratch
- โ Write clean, testable, maintainable code
- โ Document complex architectures clearly
Let's build something amazing together!
- ๐ง Email: kaziiriad@gmail.com
- ๐ฑ Phone: +880 1683152495
- ๐ผ LinkedIn: Sultan Mahmud
- ๐ Medium: @kazisultanmahmud
- ๐บ YouTube: I.T. Darshonik

