Network Observability Platform with Nautobot Integration
Convergence is a general-purpose observability platform that can be adapted to different monitoring use cases. Built on OpenTelemetry Collector, VictoriaMetrics, Grafana, Loki, and Alertmanager, it provides a foundation for collecting, storing, visualizing, and alerting on telemetry data from network devices and other sources. The platform features automatic device discovery from Nautobot, GeoIP enrichment for geographic threat visualization, and intelligent alerting via Discord.
- Automatic Device Discovery: GraphQL-based integration with Nautobot for device inventory
- Rich Metadata: Every metric tagged with device hostname, IP, vendor, model, role, and site
- Multiple Telemetry Sources: SNMP, syslog (RFC 3164), with support for NETCONF, gNMI, and others
- GeoIP Enrichment: Source IP geolocation (lat/lon/country) for firewall events
- Geo-Visualization: Grafana Geomap panels showing real-time attack origins on a world map
- Pre-built Dashboards: 7 Grafana dashboards organized into Network and Security folders
- Intelligent Alerting: Provisioned alert rules with Discord notifications via Alertmanager
- Loki Ruler: LogQL-based recording rules and spike detection for firewall events
- Time-Series Storage: VictoriaMetrics with configurable retention (default: 90 days)
- Log Aggregation: Loki + Promtail for structured log storage with label extraction
- Self-signed SSL Support: Development-friendly with certificate verification toggle
- Extensible Architecture: Add new receivers, processors, and exporters as needed
Current Implementation: ✅ Operational - Monitoring 2 Cisco switches + pfSense firewall via SNMP and syslog, with GeoIP threat visualization and Discord alerting.
- Docker and Docker Compose
- Nautobot instance with API access (optional, for automatic device discovery)
- Network devices with SNMP/syslog enabled
- Python 3.12+ (for device discovery script)
-
Clone and configure:
git clone https://github.com/byrn-baker/convergence.git cd convergence # Copy and edit environment variables cp .env.example .env # Edit .env with your Nautobot URL, API token, SNMP community, and Discord webhook
-
Start the stack:
docker compose up -d
-
Discover devices from Nautobot (optional):
# List devices python3 scripts/nautobot_device_discovery.py --list-devices # Generate OTEL Collector configuration python3 scripts/nautobot_device_discovery.py --generate-config
-
Update OTEL configuration:
- Copy the generated receivers and processors to
config/otel-collector/config.yaml - Restart OTEL Collector:
docker compose restart otel-collector
- Copy the generated receivers and processors to
-
Access Grafana:
- URL: http://localhost:3000
- Default credentials: admin / admin
- Dashboards are pre-loaded in the Network and Security folders
# Check stack health
docker compose ps
# Verify metrics in VictoriaMetrics
curl http://localhost:8428/api/v1/label/device_name/values
# Check interface count
curl 'http://localhost:8428/api/v1/query?query=count(interface_in_octets_bytes_total)'
# Check Loki alerting rules
curl http://localhost:3100/loki/api/v1/rules
# Verify Alertmanager is healthy
curl http://localhost:9093/-/healthy┌─────────────────────────────────────────────────────────────┐
│ Nautobot (External) │
│ Source of Truth for Inventory │
└──────────────────────────┬───────────────────────────────────┘
│ GraphQL API
v
┌─────────────────┐
│ Device Discovery│
│ Script │
└────────┬─────────┘
│ Auto-generates config
v
┌─────────────────────────────────────────────────────────────┐
│ Network Devices │
│ Cisco, Juniper, Arista (SNMP enabled) │
│ pfSense (Syslog + SNMP enabled) │
└──────────────────────────┬───────────────────────────────────┘
│ SNMP polling (60s)
│ Syslog (port 514 UDP/TCP)
v
┌─────────────────────────────────────────────────────────────┐
│ OpenTelemetry Collector │
│ • SNMP receivers (per device) │
│ • Syslog receiver (RFC 3164) │
│ • Filterlog regex parser (pfSense firewall events) │
│ • GeoIP processor (src/dst lat, lon, country) │
│ • Attributes processors (device metadata) │
│ • Count/firewall connector (logs → metrics) │
│ • Prometheus Remote Write exporter → VictoriaMetrics │
│ • File exporter → /data/syslog/syslog.jsonl │
└───────────────┬──────────────────────┬───────────────────────┘
│ │
v v
┌──────────────────────┐ ┌──────────────────────────────────┐
│ VictoriaMetrics │ │ Promtail │
│ firewall_events_total│ │ Regex extracts Loki labels from │
│ system_uptime_seconds│ │ OTLP JSON: action, log_type, │
│ interface_in/out_* │ │ src_country, interface │
│ (90d retention) │ └──────────────┬───────────────────┘
└──────────┬────────────┘ │
│ v
│ ┌──────────────────────┐
│ │ Loki │
│ │ Log storage + Ruler │
│ │ Recording rules: │
│ │ blocks_by_country │
│ │ Alert rules → AM │
│ └──────────┬────────────┘
│ │
│ v
│ ┌──────────────────────┐
│ │ Alertmanager │
│ │ Routes: critical/ │
│ │ security/network │
│ │ → Discord webhook │
│ └──────────────────────┘
│
v
┌─────────────────────────────────────────────────────────────┐
│ Grafana (port 3000) │
│ │
│ Network/ Security/ │
│ ├─ Interface Utilization ├─ pfSense Firewall Security │
│ ├─ Interface Errors │ ├─ Geomap: WAN Threats │
│ ├─ Network Overview │ └─ Attack analysis │
│ ├─ Platform Health └─ Threat Analysis │
│ └─ Network Device Health ├─ Top countries │
│ ├─ Uptime stats ├─ Protocol distribution │
│ ├─ Error rates └─ Attack timeseries │
│ └─ Bandwidth per device │
│ │
│ Unified Alerting → Discord (5 provisioned rules) │
└─────────────────────────────────────────────────────────────┘
convergence/
├── config/
│ ├── otel-collector/
│ │ ├── config.yaml # Main OTEL Collector configuration
│ │ └── receivers/
│ │ └── home-lab.yaml # Device-specific SNMP receivers + processors
│ ├── victoriametrics/
│ │ └── prometheus.yml # Scrape configuration
│ ├── loki/
│ │ ├── local-config.yaml # Loki configuration (ruler enabled)
│ │ └── rules/
│ │ └── fake/
│ │ └── firewall_alerts.yaml # LogQL recording + alerting rules
│ ├── promtail/
│ │ └── config.yaml # Promtail log shipping + label extraction
│ ├── alertmanager/
│ │ └── alertmanager.yml # Alert routing configuration
│ └── grafana/
│ └── provisioning/
│ ├── datasources/ # VictoriaMetrics + Loki data sources
│ ├── dashboards/ # Dashboard folder providers
│ └── alerting/
│ ├── alert_rules.yaml # 5 provisioned alert rules
│ ├── contact_points.yaml # Discord, Webhook, Email, Do Nothing
│ └── notification_policies.yaml # Routing tree → Discord
│
├── dashboards/
│ ├── network/
│ │ ├── interface-utilization.json
│ │ ├── interface-errors.json
│ │ ├── network-overview.json
│ │ ├── platform-health.json
│ │ └── device-health.json # Uptime, error rates, bandwidth per device
│ ├── security/
│ │ ├── pfsense-firewall-security.json # Geomap + firewall event analysis
│ │ └── threat-analysis.json # Country breakdown, attack trends
│ ├── cisco/ # Reserved for vendor-specific dashboards
│ ├── juniper/
│ └── arista/
│
├── scripts/
│ ├── nautobot_device_discovery.py # Device discovery and config generation
│ └── setup-geoip.sh # GeoIP database installer
│
├── docs/
│ ├── PROJECT_STATUS.md # Detailed project status and history
│ ├── PHASE3_ALERTING.md # Phase 3: alerting, geo-viz, dashboard guide
│ ├── FIREWALL-SECURITY-DASHBOARD.md
│ ├── NAUTOBOT_ENRICHMENT.md
│ └── quickstart/
│
├── data/
│ ├── geoip/ # MaxMind GeoLite2-City.mmdb
│ └── otelcol/ # OTEL file exporter output (syslog.jsonl)
│
├── docker-compose.yml # Docker services orchestration
├── .env.example # Environment variables template
├── validate_stack.sh # Stack health validation
└── validate_nautobot.sh # Nautobot integration validation
| Service | URL | Credentials |
|---|---|---|
| Grafana | http://localhost:3000 | admin / admin |
| VictoriaMetrics API | http://localhost:8428 | N/A |
| Loki API | http://localhost:3100 | N/A |
| Alertmanager | http://localhost:9093 | N/A |
| Promtail Metrics | http://localhost:9080 | N/A |
| OTEL Collector Health | http://localhost:13133 | N/A |
| OTEL Collector Metrics | http://localhost:8888 | N/A |
| Redis | localhost:6379 | N/A (future use) |
- Interface Utilization — Traffic rates in bps, top interfaces, per-interface in/out graphs
- Interface Errors — Error rates, top interfaces by errors, historical trends
- Network Overview — Device count, total interfaces, platform-wide metrics
- Platform Health — OTEL Collector, VictoriaMetrics, and service health metrics
- Network Device Health (new)
- Per-device uptime stats with colour thresholds (green ≥1d, orange ≥10m, red <10m)
- Uptime history timeseries — drops to near-zero indicate reboots
- Interface error rates (table + timeseries, only shows interfaces with active errors)
- Total bandwidth per device (IN + OUT in bps)
-
pfSense Firewall Security
- Geomap: WAN threats — blocked IPs plotted by source lat/lon, sized by block count
- Geomap: Traffic destinations
- Top 100 blocked source IPs table
- Firewall actions over time (pass vs block)
- Protocol and interface distribution
-
Threat Analysis (new)
- Stats: total blocks (24h), attacking countries, current block rate (blocks/min)
- Top 10 attacking countries (horizontal bar chart)
- Protocol distribution (donut chart)
- Attack rate by country over time (top 7, 15m rolling rate)
- Blocks by interface (stacked timeseries)
- Full sortable country breakdown table
Five provisioned alert rules evaluate every 1–2 minutes:
| Rule | Condition |
|---|---|
| High Block Rate From Country (1h) | >1000 blocks from one country in 1h |
| Firewall Block Rate Spike (5m) | >500 total blocks in 5m |
| Rule | Condition |
|---|---|
| Network Switch Rebooted | Uptime counter drops (negative delta) |
| Network Switch Low Uptime | Any switch uptime <10 minutes |
| Network Device SNMP Unreachable | No SNMP data for >5 minutes |
All alerts route to Discord by default. Set DISCORD_WEBHOOK_URL in .env and run:
docker compose up -d --force-recreate grafanaTest the Discord contact point:
curl -s -u admin:admin \
-X POST http://localhost:3000/api/v1/provisioning/contact-points/convergence-discord/test \
-H "Content-Type: application/json" -d '{}'See docs/PHASE3_ALERTING.md for full alerting documentation.
For detailed information, see the docs folder:
- Project Status: Current capabilities, recent improvements, lessons learned, and roadmap
- Phase 3 Alerting Guide: Alerting pipeline, geo-visualization, dashboard organization, and troubleshooting
- Nautobot Integration: Setup guide for Nautobot API integration
- Firewall Dashboard Example: pfSense integration guide
- Quick Start Guide: Step-by-step deployment instructions
Key variables in .env:
# Nautobot Configuration (optional)
NAUTOBOT_URL=https://your-nautobot-instance
NAUTOBOT_TOKEN=your-api-token-here
NAUTOBOT_VERIFY_SSL=false # For self-signed certificates
# SNMP Configuration
SNMP_COMMUNITY=public
# MaxMind GeoIP (required for geographic threat visualization)
# Run scripts/setup-geoip.sh to download the database
MAXMIND_ACCOUNT_ID=your_account_id
MAXMIND_LICENSE_KEY=your_license_key
# VictoriaMetrics
VM_RETENTION_PERIOD=90d
# Grafana
GRAFANA_ADMIN_PASSWORD=admin
# Alerting — Discord webhook for alert notifications
# Server Settings → Integrations → Webhooks → New Webhook → Copy URL
DISCORD_WEBHOOK_URL=https://discord.com/api/webhooks/YOUR_ID/YOUR_TOKEN
# Generic webhook (Slack, n8n, custom endpoint)
# ALERT_WEBHOOK_URL=https://hooks.slack.com/services/...See .env.example for all available options.
docker compose restart does not apply env var changes. Use --force-recreate:
docker compose up -d --force-recreate grafana-
Add device to Nautobot:
- Create device with primary IPv4 address
- Set device type, manufacturer, role, and location
- Ensure status is "Active"
-
Generate configuration:
python3 scripts/nautobot_device_discovery.py --generate-config
-
Update OTEL Collector:
- Add generated receivers and processors to
config/otel-collector/receivers/home-lab.yaml - The pipeline in
config.yamlalready includes new receivers automatically
- Add generated receivers and processors to
-
Restart collector:
docker compose restart otel-collector
-
Verify in Grafana:
- Check Network Overview and Network Device Health dashboards
- Confirm device appears with correct metadata
# Full stack validation
./validate_stack.sh
# Nautobot connectivity test
./validate_nautobot.sh
# Check discovered devices
python3 scripts/nautobot_device_discovery.py --list-devices
# Query VictoriaMetrics
curl 'http://localhost:8428/api/v1/label/device_name/values'
curl 'http://localhost:8428/api/v1/query?query=system_uptime_seconds'
# Check Loki ruler rules
curl http://localhost:3100/loki/api/v1/rules
# List Grafana alert rules
curl -s -u admin:admin http://localhost:3000/api/v1/provisioning/alert-rules | \
python3 -c "import sys,json; [print(r['uid'],'-',r['title']) for r in json.load(sys.stdin)]"
# Verify Discord contact point loaded
curl -s -u admin:admin http://localhost:3000/api/v1/provisioning/contact-points | \
python3 -c "import sys,json; [print(f['name'],'-',f['type']) for f in json.load(sys.stdin)]"- Automatic device discovery from Nautobot (GraphQL)
- SNMP monitoring: 2 Cisco switches + pfSense firewall (uptime, interfaces, bandwidth, errors)
- Full device metadata enrichment (name, IP, vendor, model, role, site)
- Real interface names (e.g., "GigabitEthernet1/0/1")
- pfSense syslog ingestion with filterlog parsing and GeoIP enrichment
firewall_events_totalmetric with geo labels (src_lat, src_lon, src_country)- 7 Grafana dashboards in organized Network/Security folders
- Loki ruler: LogQL recording rules and spike detection alerting
- 5 provisioned Grafana alert rules (security + network health)
- Discord alerting via Alertmanager and Grafana Unified Alerting
- 90-day metrics retention in VictoriaMetrics
- Automated pfSense response: add block rules via API when under attack
- Dynamic baselines: MetricsQL
outlier_iqr_over_time()to replace fixed thresholds - AI integration: LLM-powered natural language security summaries
- Multi-site: extend Alertmanager routing for multiple pfSense instances
- Additional protocols: NETCONF, gNMI
See docs/PROJECT_STATUS.md for detailed roadmap.
The platform can be adapted for various observability scenarios:
-
Network Device Monitoring (Current Primary Use)
- SNMP polling of switches, routers, firewalls
- Interface utilization and error tracking
- Device health and uptime monitoring with reboot detection
-
Firewall/Security Monitoring (Implemented)
- Syslog ingestion from pfSense
- Log parsing, GeoIP enrichment, log-to-metrics conversion
- Geographic threat visualization with real-time Geomap panels
- Country-based attack analysis and spike alerting to Discord
-
Application Monitoring (Potential)
- OTLP metrics from applications
- Log aggregation from services
- Custom metric collection
-
Infrastructure Monitoring (Potential)
- System metrics from servers
- Container metrics from Docker/Kubernetes
- Cloud resource monitoring
Contributions welcome! If you encounter issues or have improvements:
- Check docs/PROJECT_STATUS.md for known limitations
- Document your environment and steps to reproduce
- Include relevant logs and error messages
- Submit detailed bug reports or pull requests
MIT License - See LICENSE file for details.
- Built with OpenTelemetry Collector
- Powered by VictoriaMetrics
- Visualized with Grafana
- Integrated with Nautobot
- Log aggregation with Grafana Loki
- Alert routing with Prometheus Alertmanager
Need Help? Check the documentation or PHASE3_ALERTING.md for detailed troubleshooting guides.