|
| 1 | +# Utilization Monitor |
| 2 | + |
| 3 | +This guide explains how to use the {ruby Async::Service::Supervisor::UtilizationMonitor} to collect and aggregate application-level utilization metrics from your worker processes. |
| 4 | + |
| 5 | +## Overview |
| 6 | + |
| 7 | +While the {ruby Async::Service::Supervisor::ProcessMonitor} captures OS-level metrics (CPU, memory) and the {ruby Async::Service::Supervisor::MemoryMonitor} takes action when limits are exceeded, the `UtilizationMonitor` focuses on **application-level metrics**—connections, requests, queue depths, and other business-specific utilization data. Without it, you can't easily answer questions like "How many active connections do my workers have?" or "What is the total request throughput across all workers?" |
| 8 | + |
| 9 | +The `UtilizationMonitor` solves this by using shared memory to efficiently collect metrics from workers and aggregate them by service name. Workers write metrics to a shared memory segment; the supervisor periodically reads and aggregates them without any IPC overhead during collection. |
| 10 | + |
| 11 | +Use the `UtilizationMonitor` when you need: |
| 12 | + |
| 13 | +- **Application observability**: Track connections, requests, queue depths, or custom metrics across workers. |
| 14 | +- **Service-level aggregation**: See totals per service (e.g., "echo" service: 42 connections, 1000 messages). |
| 15 | +- **Lightweight collection**: Avoid IPC or network calls—metrics are read directly from shared memory. |
| 16 | +- **Integration with logging**: Emit aggregated metrics to your logging pipeline for dashboards and alerts. |
| 17 | + |
| 18 | +The monitor uses the `async-utilization` gem for schema definition and shared memory layout. Workers must include {ruby Async::Service::Supervisor::Supervised} and define a `utilization_schema` to participate. |
| 19 | + |
| 20 | +## Usage |
| 21 | + |
| 22 | +### Supervisor Configuration |
| 23 | + |
| 24 | +Add a utilization monitor to your supervisor service: |
| 25 | + |
| 26 | +```ruby |
| 27 | +service "supervisor" do |
| 28 | + include Async::Service::Supervisor::Environment |
| 29 | + |
| 30 | + monitors do |
| 31 | + [ |
| 32 | + Async::Service::Supervisor::UtilizationMonitor.new( |
| 33 | + path: File.expand_path("utilization.shm", root), |
| 34 | + interval: 10 # Aggregate and emit metrics every 10 seconds |
| 35 | + ) |
| 36 | + ] |
| 37 | + end |
| 38 | +end |
| 39 | +``` |
| 40 | + |
| 41 | +### Worker Configuration |
| 42 | + |
| 43 | +Workers must include {ruby Async::Service::Supervisor::Supervised} and define a `utilization_schema` that describes the metrics they expose: |
| 44 | + |
| 45 | +```ruby |
| 46 | +service "echo" do |
| 47 | + include Async::Service::Managed::Environment |
| 48 | + include Async::Service::Supervisor::Supervised |
| 49 | + |
| 50 | + service_class EchoService |
| 51 | + |
| 52 | + utilization_schema do |
| 53 | + { |
| 54 | + connections_total: :u64, |
| 55 | + connections_active: :u32, |
| 56 | + messages_total: :u64 |
| 57 | + } |
| 58 | + end |
| 59 | +end |
| 60 | +``` |
| 61 | + |
| 62 | +### Emitting Metrics from Workers |
| 63 | + |
| 64 | +Workers obtain a utilization registry from the evaluator and use it to update metrics: |
| 65 | + |
| 66 | +```ruby |
| 67 | +def run(instance, evaluator) |
| 68 | + evaluator.prepare!(instance) |
| 69 | + instance.ready! |
| 70 | + |
| 71 | + registry = evaluator.utilization_registry |
| 72 | + connections_total = registry.metric(:connections_total) |
| 73 | + connections_active = registry.metric(:connections_active) |
| 74 | + messages_total = registry.metric(:messages_total) |
| 75 | + |
| 76 | + @bound_endpoint.accept do |peer| |
| 77 | + connections_total.increment |
| 78 | + connections_active.track do |
| 79 | + peer.each_line do |line| |
| 80 | + messages_total.increment |
| 81 | + peer.write(line) |
| 82 | + end |
| 83 | + end |
| 84 | + end |
| 85 | +end |
| 86 | +``` |
| 87 | + |
| 88 | +The supervisor aggregates these metrics by service name and emits them at the configured interval. For example: |
| 89 | + |
| 90 | +```json |
| 91 | +{ |
| 92 | + "echo": { |
| 93 | + "connections_total": 150, |
| 94 | + "connections_active": 12, |
| 95 | + "messages_total": 45000 |
| 96 | + } |
| 97 | +} |
| 98 | +``` |
| 99 | + |
| 100 | +## Configuration Options |
| 101 | + |
| 102 | +### `path` |
| 103 | + |
| 104 | +Path to the shared memory file used for worker metrics. Default: `"utilization.shm"` (relative to current working directory). |
| 105 | + |
| 106 | +Be explicit about the path when using {ruby Async::Service::Supervisor::Environment} so supervisor and workers resolve the same file regardless of working directory: |
| 107 | + |
| 108 | +```ruby |
| 109 | +monitors do |
| 110 | + [ |
| 111 | + Async::Service::Supervisor::UtilizationMonitor.new( |
| 112 | + path: File.expand_path("utilization.shm", root), |
| 113 | + interval: 10 |
| 114 | + ) |
| 115 | + ] |
| 116 | +end |
| 117 | +``` |
| 118 | + |
| 119 | +For a custom location under your application root: |
| 120 | + |
| 121 | +```ruby |
| 122 | +path: File.expand_path("tmp/utilization.shm", root) |
| 123 | +``` |
| 124 | + |
| 125 | +Ensure the path is writable by both supervisor and workers. |
| 126 | + |
| 127 | +### `interval` |
| 128 | + |
| 129 | +The interval (in seconds) at which to aggregate and emit utilization metrics. Default: `10` seconds. |
| 130 | + |
| 131 | +```ruby |
| 132 | +# Emit every second for high-frequency monitoring |
| 133 | +Async::Service::Supervisor::UtilizationMonitor.new(interval: 1) |
| 134 | + |
| 135 | +# Emit every 5 minutes for low-overhead monitoring |
| 136 | +Async::Service::Supervisor::UtilizationMonitor.new(interval: 300) |
| 137 | +``` |
| 138 | + |
| 139 | +### `size` |
| 140 | + |
| 141 | +Total size of the shared memory buffer. Default: `IO::Buffer::PAGE_SIZE * 8`. The buffer grows automatically when more workers are registered than segments available. |
| 142 | + |
| 143 | +```ruby |
| 144 | +Async::Service::Supervisor::UtilizationMonitor.new( |
| 145 | + size: IO::Buffer::PAGE_SIZE * 32 # Larger initial buffer for many workers |
| 146 | +) |
| 147 | +``` |
| 148 | + |
| 149 | +### `segment_size` |
| 150 | + |
| 151 | +Size of each allocation segment per worker. Default: `512` bytes. Must accommodate your schema; the `async-utilization` gem lays out fields according to type (e.g., `u64` = 8 bytes, `u32` = 4 bytes). |
| 152 | + |
| 153 | +```ruby |
| 154 | +Async::Service::Supervisor::UtilizationMonitor.new( |
| 155 | + segment_size: 256 # Smaller segments if schema is compact |
| 156 | +) |
| 157 | +``` |
| 158 | + |
| 159 | +## Schema Types |
| 160 | + |
| 161 | +The `utilization_schema` maps metric names to types supported by {ruby IO::Buffer}: |
| 162 | + |
| 163 | +| Type | Size | Use case | |
| 164 | +|------|------|----------| |
| 165 | +| `:u32` | 4 bytes | Counters that may wrap (e.g., connections_active) | |
| 166 | +| `:u64` | 8 bytes | Monotonically increasing counters (e.g., requests_total) | |
| 167 | +| `:i32` | 4 bytes | Signed 32-bit values | |
| 168 | +| `:i64` | 8 bytes | Signed 64-bit values | |
| 169 | +| `:f32` | 4 bytes | Single-precision floats | |
| 170 | +| `:f64` | 8 bytes | Double-precision floats | |
| 171 | + |
| 172 | +Prefer `:u64` for totals that only increase; use `:u32` for gauges or values that may decrease. |
| 173 | + |
| 174 | +## Default Schema |
| 175 | + |
| 176 | +The {ruby Async::Service::Supervisor::Supervised} mixin provides a default schema if you don't override `utilization_schema`: |
| 177 | + |
| 178 | +```ruby |
| 179 | +{ |
| 180 | + connections_active: :u32, |
| 181 | + connections_total: :u64, |
| 182 | + requests_active: :u32, |
| 183 | + requests_total: :u64 |
| 184 | +} |
| 185 | +``` |
| 186 | + |
| 187 | +Override it when your service has different metrics: |
| 188 | + |
| 189 | +```ruby |
| 190 | +utilization_schema do |
| 191 | + { |
| 192 | + connections_active: :u32, |
| 193 | + connections_total: :u64, |
| 194 | + messages_total: :u64, |
| 195 | + queue_depth: :u32 |
| 196 | + } |
| 197 | +end |
| 198 | +``` |
| 199 | + |
| 200 | +## Metric API |
| 201 | + |
| 202 | +The utilization registry provides methods to update metrics: |
| 203 | + |
| 204 | +- **`increment`**: Increment a counter by 1. |
| 205 | +- **`set(value)`**: Set a gauge to a specific value. |
| 206 | +- **`track { ... }`**: Execute a block and increment/decrement a gauge around it (e.g., `connections_active` while handling a connection). |
| 207 | + |
| 208 | +```ruby |
| 209 | +connections_total = registry.metric(:connections_total) |
| 210 | +connections_active = registry.metric(:connections_active) |
| 211 | + |
| 212 | +# Increment total connections when a client connects |
| 213 | +connections_total.increment |
| 214 | + |
| 215 | +# Track active connections for the duration of the block |
| 216 | +connections_active.track do |
| 217 | + handle_client(peer) |
| 218 | +end |
| 219 | +``` |
| 220 | + |
| 221 | +## Aggregation Behavior |
| 222 | + |
| 223 | +Metrics are aggregated by service name (from `supervisor_worker_state[:name]`). Values are summed across workers of the same service. For example, with 4 workers each reporting `connections_active: 3`, the aggregated value is `12`. |
| 224 | + |
| 225 | +## Best Practices |
| 226 | + |
| 227 | +- **Define a minimal schema**: Only include metrics you need; each field consumes shared memory. |
| 228 | +- **Use appropriate types**: `u64` for ever-increasing counters; `u32` for gauges. |
| 229 | +- **Match schema across workers**: All workers of the same service should use the same schema for consistent aggregation. |
| 230 | +- **Combine with other monitors**: Use `UtilizationMonitor` alongside `ProcessMonitor` and `MemoryMonitor` for full observability. |
| 231 | + |
| 232 | +## Common Pitfalls |
| 233 | + |
| 234 | +- **Workers without schema**: Workers that don't define `utilization_schema` (or return `nil`) are not registered. They won't contribute to utilization metrics. |
| 235 | +- **Schema mismatch**: If workers of the same service use different schemas, aggregation may produce incorrect or partial results. |
| 236 | +- **Path permissions**: Ensure the shared memory path is accessible to all worker processes (e.g., same user, or appropriate permissions). |
| 237 | +- **Segment size**: If your schema is large, increase `segment_size` to avoid allocation failures. |
0 commit comments