Convergence

Network Observability Platform with Nautobot Integration

Convergence is a general-purpose observability platform that can be adapted to different monitoring use cases. Built on OpenTelemetry Collector, VictoriaMetrics, Grafana, Loki, and Alertmanager, it provides a foundation for collecting, storing, visualizing, and alerting on telemetry data from network devices and other sources. The platform features automatic device discovery from Nautobot, GeoIP enrichment for geographic threat visualization, and intelligent alerting via Discord.

✨ Features

Automatic Device Discovery: GraphQL-based integration with Nautobot for device inventory
Rich Metadata: Every metric tagged with device hostname, IP, vendor, model, role, and site
Multiple Telemetry Sources: SNMP, syslog (RFC 3164), with support for NETCONF, gNMI, and others
GeoIP Enrichment: Source IP geolocation (lat/lon/country) for firewall events
Geo-Visualization: Grafana Geomap panels showing real-time attack origins on a world map
Pre-built Dashboards: 7 Grafana dashboards organized into Network and Security folders
Intelligent Alerting: Provisioned alert rules with Discord notifications via Alertmanager
Loki Ruler: LogQL-based recording rules and spike detection for firewall events
Time-Series Storage: VictoriaMetrics with configurable retention (default: 90 days)
Log Aggregation: Loki + Promtail for structured log storage with label extraction
Self-signed SSL Support: Development-friendly with certificate verification toggle
Extensible Architecture: Add new receivers, processors, and exporters as needed

Current Implementation: ✅ Operational - Monitoring 2 Cisco switches + pfSense firewall via SNMP and syslog, with GeoIP threat visualization and Discord alerting.

🚀 Quick Start

Prerequisites

Docker and Docker Compose
Nautobot instance with API access (optional, for automatic device discovery)
Network devices with SNMP/syslog enabled
Python 3.12+ (for device discovery script)

Installation

Clone and configure:

git clone https://github.com/byrn-baker/convergence.git
cd convergence

# Copy and edit environment variables
cp .env.example .env
# Edit .env with your Nautobot URL, API token, SNMP community, and Discord webhook

Start the stack:
```
docker compose up -d
```

Discover devices from Nautobot (optional):

# List devices
python3 scripts/nautobot_device_discovery.py --list-devices

# Generate OTEL Collector configuration
python3 scripts/nautobot_device_discovery.py --generate-config

Update OTEL configuration:
- Copy the generated receivers and processors to config/otel-collector/config.yaml
- Restart OTEL Collector: docker compose restart otel-collector
Access Grafana:
- URL: http://localhost:3000
- Default credentials: admin / admin
- Dashboards are pre-loaded in the Network and Security folders

Validation

# Check stack health
docker compose ps

# Verify metrics in VictoriaMetrics
curl http://localhost:8428/api/v1/label/device_name/values

# Check interface count
curl 'http://localhost:8428/api/v1/query?query=count(interface_in_octets_bytes_total)'

# Check Loki alerting rules
curl http://localhost:3100/loki/api/v1/rules

# Verify Alertmanager is healthy
curl http://localhost:9093/-/healthy

📊 Architecture

┌─────────────────────────────────────────────────────────────┐
│                    Nautobot (External)                       │
│              Source of Truth for Inventory                   │
└──────────────────────────┬───────────────────────────────────┘
                           │ GraphQL API
                           v
                  ┌─────────────────┐
                  │ Device Discovery│
                  │     Script      │
                  └────────┬─────────┘
                           │ Auto-generates config
                           v
┌─────────────────────────────────────────────────────────────┐
│                     Network Devices                          │
│         Cisco, Juniper, Arista (SNMP enabled)               │
│         pfSense (Syslog + SNMP enabled)                      │
└──────────────────────────┬───────────────────────────────────┘
                           │ SNMP polling (60s)
                           │ Syslog (port 514 UDP/TCP)
                           v
┌─────────────────────────────────────────────────────────────┐
│              OpenTelemetry Collector                          │
│    • SNMP receivers (per device)                             │
│    • Syslog receiver (RFC 3164)                              │
│    • Filterlog regex parser (pfSense firewall events)        │
│    • GeoIP processor (src/dst lat, lon, country)             │
│    • Attributes processors (device metadata)                 │
│    • Count/firewall connector (logs → metrics)               │
│    • Prometheus Remote Write exporter → VictoriaMetrics      │
│    • File exporter → /data/syslog/syslog.jsonl               │
└───────────────┬──────────────────────┬───────────────────────┘
                │                      │
                v                      v
┌──────────────────────┐   ┌──────────────────────────────────┐
│    VictoriaMetrics    │   │           Promtail               │
│  firewall_events_total│   │  Regex extracts Loki labels from │
│  system_uptime_seconds│   │  OTLP JSON: action, log_type,    │
│  interface_in/out_*   │   │  src_country, interface          │
│  (90d retention)      │   └──────────────┬───────────────────┘
└──────────┬────────────┘                  │
           │                              v
           │                  ┌──────────────────────┐
           │                  │         Loki          │
           │                  │  Log storage + Ruler  │
           │                  │  Recording rules:     │
           │                  │  blocks_by_country    │
           │                  │  Alert rules → AM     │
           │                  └──────────┬────────────┘
           │                             │
           │                             v
           │                  ┌──────────────────────┐
           │                  │     Alertmanager      │
           │                  │  Routes: critical/    │
           │                  │  security/network     │
           │                  │  → Discord webhook    │
           │                  └──────────────────────┘
           │
           v
┌─────────────────────────────────────────────────────────────┐
│                    Grafana (port 3000)                        │
│                                                               │
│  Network/                      Security/                     │
│  ├─ Interface Utilization       ├─ pfSense Firewall Security  │
│  ├─ Interface Errors            │   ├─ Geomap: WAN Threats   │
│  ├─ Network Overview            │   └─ Attack analysis        │
│  ├─ Platform Health             └─ Threat Analysis            │
│  └─ Network Device Health           ├─ Top countries          │
│      ├─ Uptime stats                ├─ Protocol distribution  │
│      ├─ Error rates                 └─ Attack timeseries      │
│      └─ Bandwidth per device                                  │
│                                                               │
│  Unified Alerting → Discord (5 provisioned rules)            │
└─────────────────────────────────────────────────────────────┘

📁 Project Structure

convergence/
├── config/
│   ├── otel-collector/
│   │   ├── config.yaml              # Main OTEL Collector configuration
│   │   └── receivers/
│   │       └── home-lab.yaml        # Device-specific SNMP receivers + processors
│   ├── victoriametrics/
│   │   └── prometheus.yml           # Scrape configuration
│   ├── loki/
│   │   ├── local-config.yaml        # Loki configuration (ruler enabled)
│   │   └── rules/
│   │       └── fake/
│   │           └── firewall_alerts.yaml  # LogQL recording + alerting rules
│   ├── promtail/
│   │   └── config.yaml              # Promtail log shipping + label extraction
│   ├── alertmanager/
│   │   └── alertmanager.yml         # Alert routing configuration
│   └── grafana/
│       └── provisioning/
│           ├── datasources/         # VictoriaMetrics + Loki data sources
│           ├── dashboards/          # Dashboard folder providers
│           └── alerting/
│               ├── alert_rules.yaml          # 5 provisioned alert rules
│               ├── contact_points.yaml       # Discord, Webhook, Email, Do Nothing
│               └── notification_policies.yaml # Routing tree → Discord
│
├── dashboards/
│   ├── network/
│   │   ├── interface-utilization.json
│   │   ├── interface-errors.json
│   │   ├── network-overview.json
│   │   ├── platform-health.json
│   │   └── device-health.json       # Uptime, error rates, bandwidth per device
│   ├── security/
│   │   ├── pfsense-firewall-security.json   # Geomap + firewall event analysis
│   │   └── threat-analysis.json             # Country breakdown, attack trends
│   ├── cisco/                       # Reserved for vendor-specific dashboards
│   ├── juniper/
│   └── arista/
│
├── scripts/
│   ├── nautobot_device_discovery.py # Device discovery and config generation
│   └── setup-geoip.sh               # GeoIP database installer
│
├── docs/
│   ├── PROJECT_STATUS.md            # Detailed project status and history
│   ├── PHASE3_ALERTING.md           # Phase 3: alerting, geo-viz, dashboard guide
│   ├── FIREWALL-SECURITY-DASHBOARD.md
│   ├── NAUTOBOT_ENRICHMENT.md
│   └── quickstart/
│
├── data/
│   ├── geoip/                       # MaxMind GeoLite2-City.mmdb
│   └── otelcol/                     # OTEL file exporter output (syslog.jsonl)
│
├── docker-compose.yml               # Docker services orchestration
├── .env.example                     # Environment variables template
├── validate_stack.sh                # Stack health validation
└── validate_nautobot.sh             # Nautobot integration validation

🔌 Service Access Points

Service	URL	Credentials
Grafana	http://localhost:3000	admin / admin
VictoriaMetrics API	http://localhost:8428	N/A
Loki API	http://localhost:3100	N/A
Alertmanager	http://localhost:9093	N/A
Promtail Metrics	http://localhost:9080	N/A
OTEL Collector Health	http://localhost:13133	N/A
OTEL Collector Metrics	http://localhost:8888	N/A
Redis	localhost:6379	N/A (future use)

📈 Available Dashboards

Network Folder

Interface Utilization — Traffic rates in bps, top interfaces, per-interface in/out graphs
Interface Errors — Error rates, top interfaces by errors, historical trends
Network Overview — Device count, total interfaces, platform-wide metrics
Platform Health — OTEL Collector, VictoriaMetrics, and service health metrics
Network Device Health (new)
- Per-device uptime stats with colour thresholds (green ≥1d, orange ≥10m, red <10m)
- Uptime history timeseries — drops to near-zero indicate reboots
- Interface error rates (table + timeseries, only shows interfaces with active errors)
- Total bandwidth per device (IN + OUT in bps)

Security Folder

pfSense Firewall Security
- Geomap: WAN threats — blocked IPs plotted by source lat/lon, sized by block count
- Geomap: Traffic destinations
- Top 100 blocked source IPs table
- Firewall actions over time (pass vs block)
- Protocol and interface distribution
Threat Analysis (new)
- Stats: total blocks (24h), attacking countries, current block rate (blocks/min)
- Top 10 attacking countries (horizontal bar chart)
- Protocol distribution (donut chart)
- Attack rate by country over time (top 7, 15m rolling rate)
- Blocks by interface (stacked timeseries)
- Full sortable country breakdown table

🔔 Alerting

Five provisioned alert rules evaluate every 1–2 minutes:

Security Alerts

Rule	Condition
High Block Rate From Country (1h)	>1000 blocks from one country in 1h
Firewall Block Rate Spike (5m)	>500 total blocks in 5m

Network Health Alerts

Rule	Condition
Network Switch Rebooted	Uptime counter drops (negative delta)
Network Switch Low Uptime	Any switch uptime <10 minutes
Network Device SNMP Unreachable	No SNMP data for >5 minutes

All alerts route to Discord by default. Set DISCORD_WEBHOOK_URL in .env and run:

docker compose up -d --force-recreate grafana

Test the Discord contact point:

curl -s -u admin:admin \
  -X POST http://localhost:3000/api/v1/provisioning/contact-points/convergence-discord/test \
  -H "Content-Type: application/json" -d '{}'

See docs/PHASE3_ALERTING.md for full alerting documentation.

📖 Documentation

For detailed information, see the docs folder:

Project Status: Current capabilities, recent improvements, lessons learned, and roadmap
Phase 3 Alerting Guide: Alerting pipeline, geo-visualization, dashboard organization, and troubleshooting
Nautobot Integration: Setup guide for Nautobot API integration
Firewall Dashboard Example: pfSense integration guide
Quick Start Guide: Step-by-step deployment instructions

🛠️ Configuration

Environment Variables

Key variables in .env:

# Nautobot Configuration (optional)
NAUTOBOT_URL=https://your-nautobot-instance
NAUTOBOT_TOKEN=your-api-token-here
NAUTOBOT_VERIFY_SSL=false  # For self-signed certificates

# SNMP Configuration
SNMP_COMMUNITY=public

# MaxMind GeoIP (required for geographic threat visualization)
# Run scripts/setup-geoip.sh to download the database
MAXMIND_ACCOUNT_ID=your_account_id
MAXMIND_LICENSE_KEY=your_license_key

# VictoriaMetrics
VM_RETENTION_PERIOD=90d

# Grafana
GRAFANA_ADMIN_PASSWORD=admin

# Alerting — Discord webhook for alert notifications
# Server Settings → Integrations → Webhooks → New Webhook → Copy URL
DISCORD_WEBHOOK_URL=https://discord.com/api/webhooks/YOUR_ID/YOUR_TOKEN

# Generic webhook (Slack, n8n, custom endpoint)
# ALERT_WEBHOOK_URL=https://hooks.slack.com/services/...

See .env.example for all available options.

Important: Applying Environment Variable Changes

docker compose restart does not apply env var changes. Use --force-recreate:

docker compose up -d --force-recreate grafana

🔄 Workflow: Adding New Devices

Add device to Nautobot:
- Create device with primary IPv4 address
- Set device type, manufacturer, role, and location
- Ensure status is "Active"

Generate configuration:

python3 scripts/nautobot_device_discovery.py --generate-config

Update OTEL Collector:
- Add generated receivers and processors to config/otel-collector/receivers/home-lab.yaml
- The pipeline in config.yaml already includes new receivers automatically
Restart collector:
```
docker compose restart otel-collector
```
Verify in Grafana:
- Check Network Overview and Network Device Health dashboards
- Confirm device appears with correct metadata

🧪 Testing & Validation

# Full stack validation
./validate_stack.sh

# Nautobot connectivity test
./validate_nautobot.sh

# Check discovered devices
python3 scripts/nautobot_device_discovery.py --list-devices

# Query VictoriaMetrics
curl 'http://localhost:8428/api/v1/label/device_name/values'
curl 'http://localhost:8428/api/v1/query?query=system_uptime_seconds'

# Check Loki ruler rules
curl http://localhost:3100/loki/api/v1/rules

# List Grafana alert rules
curl -s -u admin:admin http://localhost:3000/api/v1/provisioning/alert-rules | \
  python3 -c "import sys,json; [print(r['uid'],'-',r['title']) for r in json.load(sys.stdin)]"

# Verify Discord contact point loaded
curl -s -u admin:admin http://localhost:3000/api/v1/provisioning/contact-points | \
  python3 -c "import sys,json; [print(f['name'],'-',f['type']) for f in json.load(sys.stdin)]"

🎯 Current Status

✅ Working Features

Automatic device discovery from Nautobot (GraphQL)
SNMP monitoring: 2 Cisco switches + pfSense firewall (uptime, interfaces, bandwidth, errors)
Full device metadata enrichment (name, IP, vendor, model, role, site)
Real interface names (e.g., "GigabitEthernet1/0/1")
pfSense syslog ingestion with filterlog parsing and GeoIP enrichment
firewall_events_total metric with geo labels (src_lat, src_lon, src_country)
7 Grafana dashboards in organized Network/Security folders
Loki ruler: LogQL recording rules and spike detection alerting
5 provisioned Grafana alert rules (security + network health)
Discord alerting via Alertmanager and Grafana Unified Alerting
90-day metrics retention in VictoriaMetrics

🎯 Next Steps (Phase 4)

Automated pfSense response: add block rules via API when under attack
Dynamic baselines: MetricsQL outlier_iqr_over_time() to replace fixed thresholds
AI integration: LLM-powered natural language security summaries
Multi-site: extend Alertmanager routing for multiple pfSense instances
Additional protocols: NETCONF, gNMI

See docs/PROJECT_STATUS.md for detailed roadmap.

💡 Example Use Cases

The platform can be adapted for various observability scenarios:

Network Device Monitoring (Current Primary Use)
- SNMP polling of switches, routers, firewalls
- Interface utilization and error tracking
- Device health and uptime monitoring with reboot detection
Firewall/Security Monitoring (Implemented)
- Syslog ingestion from pfSense
- Log parsing, GeoIP enrichment, log-to-metrics conversion
- Geographic threat visualization with real-time Geomap panels
- Country-based attack analysis and spike alerting to Discord
Application Monitoring (Potential)
- OTLP metrics from applications
- Log aggregation from services
- Custom metric collection
Infrastructure Monitoring (Potential)
- System metrics from servers
- Container metrics from Docker/Kubernetes
- Cloud resource monitoring

🤝 Contributing

Contributions welcome! If you encounter issues or have improvements:

Check docs/PROJECT_STATUS.md for known limitations
Document your environment and steps to reproduce
Include relevant logs and error messages
Submit detailed bug reports or pull requests

📝 License

MIT License - See LICENSE file for details.

🙏 Acknowledgments

Built with OpenTelemetry Collector
Powered by VictoriaMetrics
Visualized with Grafana
Integrated with Nautobot
Log aggregation with Grafana Loki
Alert routing with Prometheus Alertmanager

Need Help? Check the documentation or PHASE3_ALERTING.md for detailed troubleshooting guides.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Convergence

✨ Features

🚀 Quick Start

Prerequisites

Installation

Validation

📊 Architecture

📁 Project Structure

🔌 Service Access Points

📈 Available Dashboards

Network Folder

Security Folder

🔔 Alerting

Security Alerts

Network Health Alerts

📖 Documentation

🛠️ Configuration

Environment Variables

Important: Applying Environment Variable Changes

🔄 Workflow: Adding New Devices

🧪 Testing & Validation

🎯 Current Status

✅ Working Features

🎯 Next Steps (Phase 4)

💡 Example Use Cases

🤝 Contributing

📝 License

🙏 Acknowledgments

FilesExpand file tree

README.md

Latest commit

History

README.md

File metadata and controls

Convergence

✨ Features

🚀 Quick Start

Prerequisites

Installation

Validation

📊 Architecture

📁 Project Structure

🔌 Service Access Points

📈 Available Dashboards

Network Folder

Security Folder

🔔 Alerting

Security Alerts

Network Health Alerts

📖 Documentation

🛠️ Configuration

Environment Variables

Important: Applying Environment Variable Changes

🔄 Workflow: Adding New Devices

🧪 Testing & Validation

🎯 Current Status

✅ Working Features

🎯 Next Steps (Phase 4)

💡 Example Use Cases

🤝 Contributing

📝 License

🙏 Acknowledgments