A comprehensive monitoring solution for federated learning with an API-first approach and modern web UI.
- Real-time Federation Tracking: Monitor active federations, rounds, and collaborators
- Performance Metrics: Track training efficiency, convergence rates, and resource utilization
- Event Streaming: Real-time events via WebSocket for live monitoring
- Resource Monitoring: CPU, memory, disk, and network usage tracking
- Alert System: Configurable alerts for performance and error conditions
- RESTful API: Complete REST API for all monitoring data
- Real-time Updates: WebSocket support for live data streaming
- Flexible Queries: Advanced filtering and pagination
- Multiple Backends: Memory, SQLite, PostgreSQL storage options
- Responsive Dashboard: Beautiful, responsive React-based interface
- Real-time Charts: Live updating charts and metrics
- Federation Overview: Detailed views of federations and collaborators
- Event Timeline: Comprehensive event logging and visualization
- Mobile Friendly: Works on desktop, tablet, and mobile devices
# Navigate to the FL-Go directory
cd go/fl-go
# Start the monitoring server (with sample data)
go run cmd/monitor/main.goThe monitoring server will start on:
- API Server: http://localhost:8080
- Web UI: http://localhost:3000 (when built)
# Navigate to the web directory
cd web
# Install dependencies
npm install
# Start the development server
npm run devThe web UI will be available at http://localhost:3000
Add monitoring configuration to your FL plan:
monitoring:
enabled: true
monitoring_server_url: "http://localhost:8080"
collect_resource_metrics: true
report_interval: 30
enable_realtime_events: truecurl http://localhost:8080/api/v1/healthcurl http://localhost:8080/api/v1/federations?active=truecurl http://localhost:8080/api/v1/federations/{federation_id}curl http://localhost:8080/api/v1/federations/{federation_id}/overviewcurl http://localhost:8080/api/v1/collaborators?federation_id={federation_id}curl http://localhost:8080/api/v1/rounds?federation_id={federation_id}curl "http://localhost:8080/api/v1/events?federation_id={federation_id}&page=1&per_page=50"const ws = new WebSocket('ws://localhost:8080/api/v1/ws?federation_id={federation_id}');
ws.onmessage = (event) => {
const monitoringEvent = JSON.parse(event.data);
console.log('Real-time event:', monitoringEvent);
};Create monitoring_config.yaml:
enabled: true
api_port: 8080
webui_port: 3000
metrics_retention: "24h"
collection_interval: "30s"
enable_resource_metrics: true
enable_realtime_events: true
storage_backend: "memory" # memory, sqlite, postgres
# Resource thresholds for alerts
resource_thresholds:
cpu_warning: 80.0
cpu_critical: 95.0
memory_warning: 85.0
memory_critical: 95.0
disk_warning: 90.0
disk_critical: 98.0
# Security settings
security:
enable_cors: true
allowed_origins: ["*"] # Restrict in production
api_key_required: falseAdd to your FL plan YAML:
monitoring:
enabled: true
monitoring_server_url: "http://localhost:8080"
collect_resource_metrics: true
report_interval: 30
enable_realtime_events: true-
Monitoring Service Interface (
pkg/monitoring/service.go)- Defines the contract for monitoring operations
- Supports multiple storage backends
-
Storage Layer (
pkg/monitoring/storage.go)- In-memory storage for development/testing
- Extensible for database backends
-
REST API Server (
pkg/monitoring/api.go)- RESTful endpoints for all monitoring data
- WebSocket support for real-time updates
-
Integration Hooks (
pkg/monitoring/hooks.go)- Easy integration with existing FL components
- Event-driven metric collection
-
Web Frontend (
web/)- Modern React-based dashboard
- Real-time updates via WebSocket
- Responsive design
- FederationMetrics: Overall federation status and progress
- CollaboratorMetrics: Individual collaborator performance
- RoundMetrics: Training round details and results
- ModelUpdateMetrics: Model update timing and quality
- ResourceMetrics: System resource usage
- MonitoringEvents: Real-time event stream
import "github.com/ishaileshpant/fl-go/pkg/monitoring"
// Create monitoring hooks
monitoringHooks := monitoring.NewMonitoringHooks(monitoringService, true)
// Record federation start
err := monitoringHooks.OnFederationStart(ctx, plan, aggregatorAddress)
// Record round start
roundID, err := monitoringHooks.OnRoundStart(ctx, federationID, roundNumber, algorithm, participantCount)
// Record round completion
err := monitoringHooks.OnRoundEnd(ctx, roundID, federationID, roundNumber, duration, updatesReceived, accuracy, loss)// Record collaborator join
err := monitoringHooks.OnCollaboratorJoin(ctx, collaboratorID, federationID, address)
// Record training start/end
err := monitoringHooks.OnTrainingStart(ctx, collaboratorID, roundNumber)
err := monitoringHooks.OnTrainingEnd(ctx, collaboratorID, roundNumber, duration, accuracy, loss)
// Record model update
err := monitoringHooks.OnModelUpdateReceived(ctx, federationID, collaboratorID, roundNumber, updateSize, processingTime, staleness, weight)cd web
npm install
npm run buildgo test ./pkg/monitoring/...- Define new metric types in
pkg/monitoring/types.go - Add storage methods in
pkg/monitoring/storage.go - Add API endpoints in
pkg/monitoring/api.go - Update the web UI to display new metrics
For production, use PostgreSQL:
storage_backend: "postgres"
database_url: "postgres://user:password@localhost/fl_monitoring"- Enable API key authentication
- Restrict CORS origins
- Use HTTPS in production
- Implement rate limiting
- Use PostgreSQL for persistent storage
- Consider Redis for caching
- Load balance the API server
- Use CDN for web assets
- Connection Refused: Ensure monitoring server is running on correct port
- CORS Errors: Check allowed origins in configuration
- Memory Usage: Adjust retention settings for large deployments
- WebSocket Disconnects: Implement reconnection logic in clients
The monitoring system itself exposes metrics:
- API response times
- Memory usage
- Event processing rates
- Storage backend health
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Update documentation
- Submit a pull request
This monitoring system is part of the FL-Go project and follows the same license terms.