feat: ewma use p2c to improve performance by sysulq · Pull Request #3300 · apache/apisix

sysulq · 2021-01-15T07:09:15Z

What this PR does / why we need it:

Details can be found in #3211

Pre-submission checklist:

Did you explain what problem does this PR solve? Or what new features have been added?
[] Have you added corresponding test cases?
[] Have you modified the corresponding document?
Is this PR backward compatible? If it is not backward compatible, please discuss on the mailing list first

Before

apisix: 1 worker + 200 upstream + no plugin

+ wrk -d 5 -c 16 http://127.0.0.1:9080/hello
Running 5s test @ http://127.0.0.1:9080/hello
  2 threads and 16 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.65ms    1.53ms  32.40ms   97.65%
    Req/Sec     5.26k   663.77     8.41k    79.21%
  52974 requests in 5.11s, 211.22MB read
Requests/sec:  10369.65
Transfer/sec:     41.35MB
+ sleep 1
+ wrk -d 5 -c 16 http://127.0.0.1:9080/hello
Running 5s test @ http://127.0.0.1:9080/hello
  2 threads and 16 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.82ms    1.99ms  30.06ms   96.26%
    Req/Sec     5.10k   845.05     6.21k    81.00%
  50759 requests in 5.00s, 202.39MB read
Requests/sec:  10149.88
Transfer/sec:     40.47MB

apisix: 1 worker + 200 upstream + no plugin + ewma

+ wrk -d 5 -c 16 http://127.0.0.1:9080/hello
Running 5s test @ http://127.0.0.1:9080/hello
  2 threads and 16 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     2.95ms    1.55ms  27.52ms   93.33%
    Req/Sec     2.84k   490.54     6.22k    88.12%
  28488 requests in 5.10s, 113.60MB read
Requests/sec:   5585.89
Transfer/sec:     22.27MB
+ sleep 1
+ wrk -d 5 -c 16 http://127.0.0.1:9080/hello
Running 5s test @ http://127.0.0.1:9080/hello
  2 threads and 16 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     3.01ms    0.86ms  23.33ms   85.91%
    Req/Sec     2.68k   228.62     3.11k    68.00%
  26713 requests in 5.00s, 106.52MB read
Requests/sec:   5338.53
Transfer/sec:     21.29MB

After

apisix: 1 worker + 200 upstream + no plugin + ewma

+ wrk -d 5 -c 16 http://127.0.0.1:9080/hello
Running 5s test @ http://127.0.0.1:9080/hello
  2 threads and 16 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.85ms    0.88ms  21.68ms   94.33%
    Req/Sec     4.48k   562.97     5.48k    68.00%
  44599 requests in 5.00s, 177.83MB read
Requests/sec:   8912.87
Transfer/sec:     35.54MB
+ sleep 1
+ wrk -d 5 -c 16 http://127.0.0.1:9080/hello
Running 5s test @ http://127.0.0.1:9080/hello
  2 threads and 16 connections
  Thread Stats   Avg      Stdev     Max   +/- Stdev
    Latency     1.73ms  693.40us  16.92ms   95.53%
    Req/Sec     4.73k   434.96     5.49k    74.51%
  48052 requests in 5.09s, 191.60MB read
Requests/sec:   9440.28
Transfer/sec:     37.64MB

membphis · 2021-01-15T07:14:31Z

nice PR @hnlq715

membphis · 2021-01-15T07:17:00Z

apisix/balancer/ewma.lua

+  local _, err = ewma_lock:lock(upstream .. LOCK_KEY)
+  if err then
+    if err ~= "timeout" then
+      ngx.log(ngx.ERR, string.format("EWMA Balancer failed to lock: %s", tostring(err)))


better style(better performance):

Suggested change

ngx.log(ngx.ERR, string.format("EWMA Balancer failed to lock: %s", tostring(err)))

core.log.error("EWMA Balancer failed to lock: ", err)

membphis · 2021-01-15T07:17:14Z

apisix/balancer/ewma.lua

+local function unlock()
+  local ok, err = ewma_lock:unlock()
+  if not ok then
+    ngx.log(ngx.ERR, string.format("EWMA Balancer failed to unlock: %s", tostring(err)))


ditto

please fix similar points

membphis · 2021-01-15T07:18:49Z

apisix/balancer/ewma.lua

-    return get_or_update_ewma(upstream_name, 0, false)
-end
+  local ewma = ngx.shared.balancer_ewma:get(upstream) or 0
+  if lock_err ~= nil then


it seems wrong, please confirm this code

fixed

ngx.shared.balancer_ewma => shm_ewma

spacewander · 2021-01-15T08:19:00Z

apisix/balancer/ewma.lua

+        endpoint, backendpoint = peers[a], peers[b]
+        if score(endpoint) > score(backendpoint) then
+            endpoint, backendpoint = backendpoint, endpoint
+        end


We need to sync the tried_endpoint check from https://github.com/kubernetes/ingress-nginx/blob/a2e77185cc2e91278962f4f1267246c8fefc6e73/rootfs/etc/nginx/lua/balancer/ewma.lua#L180 to our new implementation.
You can take a look at:

apisix/apisix/balancer/roundrobin.lua

Lines 54 to 74 in bbbdf58

if not before_retry then

if ctx.balancer_tried_servers then

core.tablepool.release("balancer_tried_servers", ctx.balancer_tried_servers)

ctx.balancer_tried_servers = nil

end

return nil

end

if not ctx.balancer_tried_servers then

ctx.balancer_tried_servers = core.tablepool.fetch("balancer_tried_servers", 0, 2)

end

ctx.balancer_tried_servers[ctx.balancer_server] = true

ctx.balancer_tried_servers_count = (ctx.balancer_tried_servers_count or 0) + 1

end

}

end

return _M

Maybe this is another topic we could improve the stability, like passive/active healthcheck, tried record & tries count?

It would be better to fix this known issue in this PR.

It seems this problem is still not addressed?

Sorry, filter logic is added, I'm focusing something else in these days.

ElvinEfendi · 2021-01-15T13:55:31Z

apisix/balancer/ewma.lua

-
 local function get_or_update_ewma(upstream, rtt, update)
+    local lock_err = nil
+    if update then


Not locking for get without update will result in potentially incorrect behaviour. Imagine at line 86 you fetch current ewma value, then another worker updates it and its last_touched_at value before you retrieve last_touched_at at line 92. Then you will end up treating the old ewma as a new one. I'm not sure if in practice it would make big difference, but it is definitely not a correct behaviour.

I think the only way of getting rid of locking for "get" operations is to combine ewma and timestamp in the same value and store under single key. But then you would need to do encoding and decoding every time you set and fetch it. It can be interesting to try that and see the performance.

ping @hnlq715

Not locking for get without update will result in potentially incorrect behaviour. Imagine at line 86 you fetch current ewma value, then another worker updates it and its last_touched_at value before you retrieve last_touched_at at line 92. Then you will end up treating the old ewma as a new one. I'm not sure if in practice it would make big difference, but it is definitely not a correct behaviour.

@ElvinEfendi You're right, this is not a correct behavior, but maybe a proper solution. In this situation you mentioned, last_touched_at value should have little difference (almost the same time), and we do not need to lock all get operation, which is quite heavy.

I think the only way of getting rid of locking for "get" operations is to combine ewma and timestamp in the same value and store under single key. But then you would need to do encoding and decoding every time you set and fetch it. It can be interesting to try that and see the performance.

My first implementation is using worker process cache to store this, and we can simply avoid locking, without frequently encoding and decoding, which have a better performance. But we still have other concerns, details can be found in #3211

And with shared memory, we have to sacrifice between performance and correctness.

I did some benchmark between these two situations:

lock get and update

+ wrk -d 5 -c 16 http://127.0.0.1:9080/hello Running 5s test @ http://127.0.0.1:9080/hello 2 threads and 16 connections Thread Stats Avg Stdev Max +/- Stdev Latency 1.62ms 282.29us 7.64ms 90.45% Req/Sec 4.98k 511.87 9.62k 97.03% 50013 requests in 5.10s, 199.42MB read Requests/sec: 9806.24 Transfer/sec: 39.10MB + sleep 1 + wrk -d 5 -c 16 http://127.0.0.1:9080/hello Running 5s test @ http://127.0.0.1:9080/hello 2 threads and 16 connections Thread Stats Avg Stdev Max +/- Stdev Latency 1.69ms 288.94us 7.25ms 89.40% Req/Sec 4.77k 285.81 5.28k 63.73% 48370 requests in 5.10s, 192.87MB read Requests/sec: 9484.82 Transfer/sec: 37.82MB

lock update

+ wrk -d 5 -c 16 http://127.0.0.1:9080/hello Running 5s test @ http://127.0.0.1:9080/hello 2 threads and 16 connections Thread Stats Avg Stdev Max +/- Stdev Latency 1.57ms 289.91us 7.23ms 89.61% Req/Sec 5.14k 584.14 10.43k 96.04% 51652 requests in 5.10s, 205.95MB read Requests/sec: 10128.09 Transfer/sec: 40.38MB + sleep 1 + wrk -d 5 -c 16 http://127.0.0.1:9080/hello Running 5s test @ http://127.0.0.1:9080/hello 2 threads and 16 connections Thread Stats Avg Stdev Max +/- Stdev Latency 1.55ms 255.99us 6.55ms 89.96% Req/Sec 5.18k 539.62 9.77k 95.05% 52008 requests in 5.10s, 207.37MB read Requests/sec: 10198.48 Transfer/sec: 40.66MB

tokers · 2021-01-17T04:43:41Z

apisix/balancer/ewma.lua

+
+local function lock(upstream)
+    local _, err = ewma_lock:lock(upstream .. LOCK_KEY)
+    if err then


We can merge these two if conditions to one:

if err and err ~= "timeout" then end

tokers · 2021-01-17T04:44:25Z

apisix/balancer/ewma.lua

    end
    if forcible then
-        core.log.warn("balancer_ewma_last_touched_at:set valid items forcibly overwritten")
+        core.log


Why breaking line here?

Because the max column number is set to 80 so this line break into two lines by format tool, I fixed it by setting it to 82.

Just following the codeing style of APISIX, 100 characters are allowed in one line.

tokers · 2021-01-17T04:44:54Z

apisix/balancer/ewma.lua

    local success, err, forcible = shm_last_touched_at:set(upstream, now)
    if not success then
-        core.log.error("balancer_ewma_last_touched_at:set failed ", err)
+        core.log.warn("shm_last_touched_at:set failed: ", err)


I think the error level is reasonable.

This error often came out when the dict is full, but everything goes fine because we can evict the old items in the dict.

Refer to ingress-nginx, so maybe warn is a proper level? :-)
https://github.com/kubernetes/ingress-nginx/blob/master/rootfs/etc/nginx/lua/balancer/ewma.lua#L68

Most of time, error level is used in production clusters, the dict full is not a trivial problem, if the level is warn, i'm afraid the problem will be covered up and the troubleshooting might be difficult.

spacewander · 2021-01-27T07:19:33Z

apisix/balancer/ewma.lua

+        endpoint, backendpoint = peers[a], peers[b]
+        if score(endpoint) > score(backendpoint) then
+            endpoint, backendpoint = backendpoint, endpoint
+        end


It seems this problem is still not addressed?

spacewander · 2021-01-28T04:01:28Z

apisix/balancer/ewma.lua

-    return endpoint.host .. ":" .. endpoint.port
-end
+    local filtered_peers
+    for _, peer in ipairs(peers) do


ERROR: apisix/balancer/ewma.lua: line 161: getting the Lua global "ipairs"

spacewander · 2021-01-28T04:03:03Z

And we need a test for "fiter tried servers".

sysulq · 2021-01-28T05:43:25Z

And we need a test for "fiter tried servers".

@spacewander test case added.

tokers · 2021-01-28T05:50:07Z

apisix/balancer/ewma.lua

+    end
+
    local ewma = shm_ewma:get(upstream) or 0
+    if lock_err ~= nil then


Why check lock_err after fetching ewma? It's so strange we don't check the return value after lock return. Or you should get ewma before fetching the lock.

We need lock when update, and do not lock when get.

Why don't move this block inside the if update then?

~~We need to return ewma value for get and failed update operation.~~

Reconsider this, I think it's ok to move this block, for now we do not use this ewma value in update operation.

tokers · 2021-01-28T05:52:12Z

t/node/ewma.t

+--- error_code: 200
+--- no_error_log
+[error]
+


Redundant empty lines.

spacewander · 2021-01-28T07:10:00Z

The CI was broken. You might need to submit a new commit to trigger it.

sysulq · 2021-01-28T07:41:29Z

The CI was broken. You might need to submit a new commit to trigger it.

Seems like the CI is still broken...

spacewander · 2021-01-28T10:22:12Z

t/node/ewma.t

+            local uri = "http://127.0.0.1:" .. ngx.var.server_port
+                        .. "/ewma"
+
+            --should select the 1980 node, because 1984 is invalid


~~The 1984 is not invalid, you can confirm it via less t/servroot/conf/nginx.conf.~~
My bad, even 1984 port is open, it can't handle the request correctly. So it is invalid port.

To avoid misunderstand, I changed it to 9527

spacewander · 2021-01-28T10:23:40Z

t/node/ewma.t

+            --should select the 1980 node, because 1984 is invalid
+            local ports_count = {}
+            for i = 1, 12 do
+                local httpc = http.new()


You can speed up the test like this:

apisix/t/node/chash-balance.t

Line 309 in b78223c

local th = assert(ngx.thread.spawn(function(i)

spacewander · 2021-01-28T10:36:20Z

apisix/balancer/ewma.lua

-        endpoint = pick_and_score(peers)
+    local tried_endpoints
+    if not ctx.balancer_tried_servers then
+        tried_endpoints = {}


It seems we can save the table allocation of tried_endpoints and filtered_peers when not ctx.balancer_tried_servers?

Got it, we called after_balance in the first time.

spacewander · 2021-01-28T10:36:48Z

apisix/balancer/ewma.lua


+    if not filtered_peers then
+        core.log.warn("all endpoints have been retried")
+        filtered_peers = table_deepcopy(peers)


Need a test to cover this branch. And since we don't need to copy the peers?

spacewander · 2021-01-28T10:37:17Z

apisix/balancer/ewma.lua

+    local tried_endpoints
+    if not ctx.balancer_tried_servers then
+        tried_endpoints = {}
+        ctx.balancer_tried_servers = tried_endpoints


We already set ctx.balancer_tried_servers in the after_balancer?

Yep, I got it :-)

spacewander · 2021-01-29T01:57:42Z

apisix/balancer/ewma.lua

    store_stats(upstream, ewma, now)

-    return ewma
+    unlock()


The unlock()'s err is not checked?

We have record the error, when unlock() failed.

spacewander · 2021-01-29T01:58:22Z

apisix/balancer/ewma.lua

+    end
+
    local ewma = shm_ewma:get(upstream) or 0
+    if lock_err ~= nil then


Why don't move this block inside the if update then?

spacewander · 2021-01-29T02:00:07Z

apisix/balancer/ewma.lua

-        endpoint = peers[1]
+    local filtered_peers
+    for _, peer in ipairs(peers) do
+        if ctx.balancer_tried_servers then


This branch should move outside the for loop. And set the filtered_peers to peers if not in retry.

spacewander · 2021-01-29T02:05:02Z

apisix/balancer/ewma.lua

-    return endpoint.host .. ":" .. endpoint.port
-end
+    if not filtered_peers then
+        core.log.warn("all endpoints have been retried")


We don't need this branch as we already reject this case above:

if ctx.balancer_tried_servers and ctx.balancer_tried_servers_count == nkeys(up_nodes) then return nil, "all upstream servers tried" end

sysulq force-pushed the feature/p2cewma branch from b9e1834 to 164dadd Compare January 15, 2021 07:09

ewma use p2c

d111944

sysulq force-pushed the feature/p2cewma branch from 164dadd to d111944 Compare January 15, 2021 07:15

membphis reviewed Jan 15, 2021

View reviewed changes

sysulq added 2 commits January 15, 2021 15:30

better style

85ae7ed

format code using lua-format

08ad76c

spacewander reviewed Jan 15, 2021

View reviewed changes

sysulq added 4 commits January 15, 2021 17:22

fix lint error

e57de31

fix lint error

9e85605

fix test case

89c71ce

fix lint error

53473f6

ElvinEfendi reviewed Jan 15, 2021

View reviewed changes

tokers reviewed Jan 17, 2021

View reviewed changes

sysulq added 5 commits January 18, 2021 08:14

fix require error

3586299

fix test case

99845b1

improve typo

6bb3e1d

improve style

46ad8ea

fix log level

c3a9009

spacewander requested changes Jan 27, 2021

View reviewed changes

fiter tried servers

581f6f5

spacewander reviewed Jan 28, 2021

View reviewed changes

sysulq added 2 commits January 28, 2021 12:04

add local ipairs

7c3bfcf

add test case for filter tried servers

ea3e53e

sysulq requested review from membphis and spacewander January 28, 2021 05:49

tokers reviewed Jan 28, 2021

View reviewed changes

remove blank line

e6e5a23

add comment for test

b45a560

sysulq added 2 commits January 28, 2021 17:25

improve test case

022cda1

improve test case

6db0b80

spacewander reviewed Jan 28, 2021

View reviewed changes

sysulq added 7 commits January 28, 2021 19:14

improve test case

1a99bd1

improve ewma logic and test case

0518f39

remove unused local

4fec450

improve

c799d3e

fix test case

b57a267

improve

6a642b0

trigger test

e2f04d3

sysulq requested a review from spacewander January 29, 2021 01:06

spacewander requested changes Jan 29, 2021

View reviewed changes

sysulq added 4 commits January 29, 2021 10:40

improve filter logic

d24d0a1

remove log

116d2d4

improve lock logic

6f4eb0d

fix lint

c4c9fb6

spacewander approved these changes Jan 29, 2021

View reviewed changes

membphis approved these changes Jan 29, 2021

View reviewed changes

membphis merged commit ba11e7f into apache:master Jan 29, 2021

	ngx.log(ngx.ERR, string.format("EWMA Balancer failed to lock: %s", tostring(err)))
	core.log.error("EWMA Balancer failed to lock: ", err)

	if not before_retry then
	if ctx.balancer_tried_servers then
	core.tablepool.release("balancer_tried_servers", ctx.balancer_tried_servers)
	ctx.balancer_tried_servers = nil
	end

	return nil
	end

	if not ctx.balancer_tried_servers then
	ctx.balancer_tried_servers = core.tablepool.fetch("balancer_tried_servers", 0, 2)
	end

	ctx.balancer_tried_servers[ctx.balancer_server] = true
	ctx.balancer_tried_servers_count = (ctx.balancer_tried_servers_count or 0) + 1
	end
	}
	end


	return _M

Conversation

sysulq commented Jan 15, 2021

What this PR does / why we need it:

Pre-submission checklist:

Before

apisix: 1 worker + 200 upstream + no plugin

apisix: 1 worker + 200 upstream + no plugin + ewma

After

apisix: 1 worker + 200 upstream + no plugin + ewma

Uh oh!

membphis commented Jan 15, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sysulq Jan 19, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

lock get and update

lock update

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sysulq Jan 18, 2021 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

spacewander commented Jan 28, 2021

Uh oh!

sysulq commented Jan 28, 2021

Uh oh!

Choose a reason for hiding this comment

Uh oh!

sysulq Jan 19, 2021 •

edited

Loading

sysulq Jan 18, 2021 •

edited

Loading

sysulq Jan 29, 2021 •

edited

Loading

spacewander Jan 28, 2021 •

edited

Loading

sysulq Jan 28, 2021 •

edited

Loading