Issue #565 implements load distribution using algorithm from b7ccb83 . However, the algorithm distributes requests with bursts in some cases. For example 100 requests are distributed among servers with weights { 38, 9, 5, 4, 3, 3, 1 } as:
0 0 1 0 1 2 0 1 2 0 1 2 0 1 2 3 0 1 2 3 0 1 3 0 1 0 1 0 3 4 0 4 0 0 0 0 4 5 0 0 5 0 0 0 0 5
0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 0 0 1 0 1 2 0 1 2 0 1 2 0 1 2 3 0 1 2 3 0 1 3 0 1 0 1 0 3
4 0 4 0 0 0 0 4
However, the first server (0) should be chosen more frequently at the begin reducing the long 0-series at the end. The behaviour can be used with Weighted Fair Queue.
Secondly, the implementation now uses spin lock. We need to think more how to improve concurrency of the scheduler. Probably it has sense to use some data structures per-CPU.
Next, decisions made by the machine learning logic can be very surprising for a system administrator, so history for dynamic ratios and predictions must be shown by APM module. Anyway, this is very useful information to show to user. In future we can have a special caution logic in WebUI dashboard.
Need to test real life requests distribution by the predictive load balancer among servers with different characteristics. Also APM and ratio schedulers use relatively complex calculations and locking, so extensive performance testing and profiling are required as well. Probably we should use SIMD calculations for SLR - this could be very beneficial for large number of servers...
APM solves trade offs between measurements accuracy and performance, so APM accounting also should be revised. For example, we can collect average values for each tick instead of rewriting TFW_APM_UBUF_SZ samples in round robin manner.
Consider EMA algorithm or its derivatives for APM calculations. If a "good" APM should use more complex algorithms, then with #736 (timers issues) in mind the whole analytics must be moved to user space and probably get cleanded (e.g. sampled) data from the kernel. #1137 (speculative retries) require even more advanced ML.
Issue #565 implements load distribution using algorithm from b7ccb83 . However, the algorithm distributes requests with bursts in some cases. For example 100 requests are distributed among servers with weights
{ 38, 9, 5, 4, 3, 3, 1 }as:However, the first server (
0) should be chosen more frequently at the begin reducing the long 0-series at the end. The behaviour can be used with Weighted Fair Queue.Secondly, the implementation now uses spin lock. We need to think more how to improve concurrency of the scheduler. Probably it has sense to use some data structures per-CPU.
Next, decisions made by the machine learning logic can be very surprising for a system administrator, so history for dynamic ratios and predictions must be shown by APM module. Anyway, this is very useful information to show to user. In future we can have a special caution logic in WebUI dashboard.
Need to test real life requests distribution by the predictive load balancer among servers with different characteristics. Also APM and ratio schedulers use relatively complex calculations and locking, so extensive performance testing and profiling are required as well. Probably we should use SIMD calculations for SLR - this could be very beneficial for large number of servers...
APM solves trade offs between measurements accuracy and performance, so APM accounting also should be revised. For example, we can collect average values for each tick instead of rewriting TFW_APM_UBUF_SZ samples in round robin manner.
Consider EMA algorithm or its derivatives for APM calculations. If a "good" APM should use more complex algorithms, then with #736 (timers issues) in mind the whole analytics must be moved to user space and probably get cleanded (e.g. sampled) data from the kernel. #1137 (speculative retries) require even more advanced ML.