1. Background / Problem Description
Currently, woodpecker assigns a dedicated Goroutine to each log stream on the server side. These Goroutines run in a loop to monitor and sync buffered data.
The Scalability Bottleneck:
- Resource Exhaustion: Supporting 100k+ logs on a single node causes the memory footprint of Goroutine stacks (min. 2KB-4KB each) to exceed 400MB - 800MB just for idle stacks, even before accounting for actual log buffers.
- CPU Scheduler Pressure: The Go runtime scheduler must constantly scan 100k+ Goroutines. Even if they are in a sleep state, the wake-up and context-switching overhead for such a massive number of entities severely degrades CPU efficiency.
- Inefficiency: In many multi-tenant scenarios (e.g., IoT or Microservices), only a small fraction of logs are "active" at any given millisecond. The current "One-Goroutine-Per-Log" model wastes resources on inactive tenants.
2. Proposed Solution: Event-Driven & Lazy Activation
We propose refactoring the sync logic from a Static Polling model to an Event-Driven Task Pool model. This decouples the "Log Entity" from the "Execution Thread".
Key Components:
- Lazy Activation: A log stream will NOT have an associated Goroutine by default. It remains a passive data structure until the first byte of data is ingested.
- Global Scheduler (Timing Wheel):
- Instead of 100k timers, we use a Single Timing Wheel to manage all expiration events.
- When a log becomes "Active" (first data arrives), it registers a one-time timeout task (e.g., 5s) in the timing wheel.
- Shared Worker Pool: A fixed-size pool of Worker Goroutines (e.g., $N = \text{NumCPU} \times 2$) handles the actual Sync() I/O operations.
- Task Queue: When a log is ready to sync (either via MaxDelay timeout or BufferFull event), its LogID is pushed to a central SyncQueue.
The "Silent-to-Active" Workflow:
- Ingest: Data arrives $\rightarrow$ Update Buffer.
- Trigger: If isActive == false: set isActive = true and register with the Global Scheduler.
- Dispatch: Scheduler or Buffer-Threshold-Monitor pushes LogID to Worker Pool.
- Sync & Hibernate: Worker performs I/O $\rightarrow$ If buffer is empty, set isActive = false (the log goes back to sleep).
3. Implementation Plan
4. Expected Results (Single-Node Success Metrics)
- Goroutine Scalability: Reduce Goroutine count from $O(N_{logs})$ to $O(N_{workers})$. For 100k logs, the system should maintain < 1,000 Goroutines total.
- Memory Efficiency: Memory usage should scale with Actual Data Volume rather than the Number of Tenants.
- Idle Performance: A system with 100k inactive logs should consume near-zero CPU (only the cost of the timing wheel's tick).
- Density: Enable a single woodpecker instance to comfortably handle 100k - 200k concurrent log streams on standard cloud VMs (e.g., 4C8G).
1. Background / Problem Description
Currently, woodpecker assigns a dedicated Goroutine to each log stream on the server side. These Goroutines run in a loop to monitor and sync buffered data.
The Scalability Bottleneck:
2. Proposed Solution: Event-Driven & Lazy Activation
We propose refactoring the sync logic from a Static Polling model to an Event-Driven Task Pool model. This decouples the "Log Entity" from the "Execution Thread".
Key Components:
The "Silent-to-Active" Workflow:
3. Implementation Plan
4. Expected Results (Single-Node Success Metrics)