AI-powered production alert analysis system using n8n, MCP servers, and OpenAI GPT-4o-mini.
This project provides an intelligent alert analysis workflow that:
- 🔍 Monitors Slack channels for production alerts
- 🤖 Analyzes alerts using AI with context from GitHub code and Kibana logs
- 📊 Provides root cause analysis and actionable recommendations
- 🔗 Integrates multiple data sources via MCP (Model Context Protocol) servers
┌─────────────┐ ┌──────────────┐ ┌─────────────────┐
│ Slack │─────▶│ n8n │─────▶│ OpenAI │
│ Alerts │ │ Workflow │ │ GPT-4o-mini │
└─────────────┘ └──────┬───────┘ └─────────────────┘
│
┌───────────┴───────────┐
│ │
┌──────▼──────┐ ┌─────▼──────┐
│ GitHub MCP │ │ Kibana MCP │
│ (Port 3000)│ │ (Port 3001)│
└─────────────┘ └────────────┘
Location: n8n-workflow/
- Production Alert Analyzer - Enhanced workflow with AI-powered analysis
- Slack Integration - Monitors alerts and posts responses
- AI Agent - Routes by severity (CRITICAL vs HIGH/MEDIUM)
- Code Context - Optionally includes relevant code snippets
Features:
- ✅ Alert parsing and structured data extraction
- ✅ Severity-based routing (different analysis depth)
- ✅ Immediate acknowledgment responses
- ✅ Threaded Slack replies
- ✅ Metrics logging
📖 Documentation: See n8n-workflow/WORKFLOW_GUIDE.md
Location: dockers/
Multi-MCP server setup with HTTP bridges for n8n integration:
- Code search across repositories
- File contents retrieval
- Commit history and PR management
- 40+ GitHub tools available
- Log search and analysis
- Visualization management
- Saved objects access
- Real-time error tracking
📖 Documentation:
dockers/MULTI_MCP_SETUP.md- Complete setup guidedockers/QUICK_START.md- Quick start guide
Location: sample-app/
Spring Boot Order Service that simulates realistic production failures:
- Database timeouts and connection pool exhaustion
- Payment gateway failures
- Inventory service unavailability
- High error rates and memory issues
Location: sample-logs/
15 realistic log entries for testing:
- Elasticsearch/Kibana import ready
- Matches alert scenarios
- Includes stack traces with code references
- Structured JSON format
📖 Documentation: sample-logs/README.md
Location: sample-alerts/
10 pre-formatted Slack alerts covering:
- Database timeouts (CRITICAL)
- Payment gateway failures (HIGH)
- Inventory issues (MEDIUM)
- High error rates (CRITICAL)
- Memory and performance issues
- Docker and Docker Compose
- GitHub Personal Access Token
- Kibana server access (URL, username, password)
- n8n instance (cloud or self-hosted)
- OpenAI API key
cd dockers
# Copy and configure environment
cp .env.multi-mcp.example .env
nano .env # Add your credentials
# Update repository path in multi-mcp-docker-compose.yml
# Edit the volumes section to point to your repo
# Start services
docker compose -f multi-mcp-docker-compose.yml up -d
# Verify
curl http://localhost:3000/health # GitHub MCP
curl http://localhost:3001/health # Kibana MCPcd sample-logs
# Generate bulk import file
node convert-to-bulk.js
# Import to Elasticsearch (replace with your details)
curl -X POST "https://YOUR_KIBANA_URL:9200/_bulk" \
-H "Content-Type: application/x-ndjson" \
-H "Authorization: ApiKey YOUR_API_KEY" \
--data-binary @kibana-bulk-import.ndjson
# Create index pattern in Kibana UI: logs-order-service-*- Open n8n UI
- Go to Workflows → Import from File
- Select:
n8n-workflow/production-alert-analyzer.json - Configure credentials:
- Slack API: Add your Slack workspace credentials
- OpenAI API: Add your OpenAI API key
- Configure MCP clients:
- GitHub MCP:
http://localhost:3000/message(orhttp://host.docker.internal:3000/messageif n8n in Docker) - Kibana MCP:
http://localhost:3001/message(orhttp://host.docker.internal:3001/messageif n8n in Docker)
- GitHub MCP:
- Update channel IDs for your Slack workspace
- Activate workflow
# Option 1: Start sample application
cd sample-app
mvn clean install
mvn spring-boot:run
# Generate traffic
cd ../sample-alerts
./test-workflow.sh
# Option 2: Post sample alert to Slack
# Copy any alert from sample-alerts/sample-slack-alerts.md
# Paste into your #mcp-testing channel- Open Slack and navigate to your configured input channel (e.g.,
#mcp-testing) - Copy an alert from
sample-alerts/sample-slack-alerts.md - Paste the alert into the channel
- Wait 1-2 seconds for acknowledgment
- Check output channel (e.g.,
#n8n-output) for AI analysis (5-10 seconds)
Use the Python script to post Datadog alert configurations to Slack:
cd sample-alerts
# List all available alerts
python post_alerts_to_slack.py --list
# Post specific alerts by index
python post_alerts_to_slack.py --token YOUR_TOKEN --channel #alerts --alerts 1,3,5
# Post a range of alerts
python post_alerts_to_slack.py --token YOUR_TOKEN --channel #alerts --alerts 1-3
# Interactive mode - select which alerts to post
python post_alerts_to_slack.py --token YOUR_TOKEN --channel #alerts --interactive
# Using environment variables
export SLACK_TOKEN=xoxb-your-token
export SLACK_CHANNEL=#alerts
python post_alerts_to_slack.py --alerts 1,9Get Slack OAuth Token:
- Go to https://api.slack.com/apps
- Create App → OAuth & Permissions
- Add scopes:
chat:write,chat:write.public - Install to workspace → Copy Bot User OAuth Token
The AI will provide:
- Root Cause Analysis: Identifies the primary issue
- Immediate Actions: Actionable next steps
- Investigation Checklist: What to check
- Long-term Prevention: Recommendations to prevent recurrence
- Code References: Specific files and line numbers (if code context enabled)
- Log Analysis: Related error patterns from Kibana
Edit in n8n workflow:
- Input Channel: Where alerts are posted (default:
#mcp-testing) - Output Channel: Where analysis is sent (default:
#n8n-output)
Adjust in OpenAI node:
0.1-0.3: More deterministic, consistent responses0.4-0.7: More creative, varied responses
Edit "Route by Severity" node to change which alerts get detailed analysis:
- CRITICAL: Full root cause analysis
- HIGH/MEDIUM: Quick analysis
hackathon-2025/
├── dockers/ # MCP server setup
│ ├── multi-mcp-docker-compose.yml
│ ├── mcp-bridge/ # HTTP bridge for MCP servers
│ ├── kibana-bridge/ # Kibana-specific bridge
│ ├── MULTI_MCP_SETUP.md
│ └── QUICK_START.md
├── n8n-workflow/ # n8n workflows
│ ├── production-alert-analyzer.json
│ ├── WORKFLOW_GUIDE.md
│ └── CODE_CONTEXT_GUIDE.md
├── sample-app/ # Spring Boot test application
│ └── src/main/java/...
├── sample-logs/ # Sample Kibana logs
│ ├── kibana-sample-logs.json
│ ├── convert-to-bulk.js
│ └── README.md
├── sample-alerts/ # Sample Slack alerts
│ ├── sample-slack-alerts.md
│ └── test-workflow.sh
└── TESTING_GUIDE.md # Complete testing guide
- TESTING_GUIDE.md - Complete testing guide with scenarios
- n8n-workflow/WORKFLOW_GUIDE.md - n8n workflow details
- n8n-workflow/CODE_CONTEXT_GUIDE.md - Adding code context
- dockers/MULTI_MCP_SETUP.md - Multi-MCP server setup
- sample-logs/README.md - Log import and testing
- sample-logs/KIBANA_IMPORT_GUIDE.md - Kibana import details
# Check logs
docker compose -f multi-mcp-docker-compose.yml logs
# Verify environment variables
cat .env
# Restart services
docker compose -f multi-mcp-docker-compose.yml restart- Verify workflow is Active (toggle in top-right)
- Check Slack credentials are valid
- Verify channel IDs match your workspace
- Test with simple message: "test CRITICAL alert"
- Add more context to alerts (service, metrics, stack traces)
- Enable code context in workflow
- Lower AI temperature (0.1-0.3)
- Enhance system prompt with examples
# Test bridges
curl http://localhost:3000/health
curl http://localhost:3001/health
# If n8n in Docker, use:
# http://host.docker.internal:3000/message
# http://host.docker.internal:3001/message- Average Response Time: 5-10 seconds
- Token Usage: 500-1500 tokens per alert
- Cost: ~$0.001-0.003 per alert (GPT-4o-mini)
- Throughput: 100+ alerts/hour
- Never commit
.envfiles to version control - Use read-only repository mounts when possible
- Restrict network access to MCP bridge ports
- Use API keys with minimal required permissions
- Consider adding authentication to HTTP bridges in production
- ✅ Test with all sample alerts
- ✅ Customize prompts for your services
- ✅ Add incident ticket creation (Jira/ServiceNow)
- ✅ Integrate with monitoring systems (Datadog/Prometheus)
- ✅ Build runbook database for AI reference
- ✅ Add historical incident matching
- ✅ Create metrics dashboard
- ✅ Set up on-call escalation
This is a hackathon project. Feel free to:
- Add more MCP servers
- Enhance AI prompts
- Add new alert types
- Improve error handling
- Add more test scenarios
MIT License - See LICENSE file for details
For issues or questions:
- Check relevant documentation in subdirectories
- Review Docker logs:
docker compose logs - Test MCP bridges:
curl http://localhost:3000/health - Verify n8n execution logs in UI