Skip to content

Bug: memcached_server_response_decrement can cause response_count to wrap to max_int #130

@jtickle

Description

@jtickle

The circumstances around this event are very rare but we have a high-traffic Moodle instance utilizing memcached and fairly regularly we end up in a situation where apache processes are held open indefinitely, ultimately resulting in systemic failure.

Using gdb I have tracked this down to a failure in the memcached_response function. I don't know how exactly the response_count() is passing 0 on decrement to wrap around to max value, and it's rare, but it happens often enough to notice.

Note line 812 of response.cc:

  if (memcached_is_binary(instance->root) == false
      and memcached_server_response_count(instance) > 1) {

and again just a few line down...

    while (memcached_server_response_count(instance) > 1) {
      memcached_return_t rc =
          _read_one_response(instance, buffer, buffer_length, junked_result_ptr);

Ultimately, that _read_one_response loop is where my server is getting stuck. Somehow, memcached_server_response_count(instance) is passing zero without being caught by this; and then, it's positive, so we sit in this loop forever.

Further down the call stack we sit in a poll function for 5s each time, which is where I started from in tracking down this issue. That call fails, loops back around to memcached_response, which decrements this absurdly high counter and starts the process over once again.

Here is what I was able to pull out of the backtrace from that poll function.

(gdb) bt
#0  0x00007f45c43d5ddd in poll () at ../sysdeps/unix/syscall-template.S:81
#1  0x00007f45c5d98c4f in poll (__timeout=5000, __nfds=1, __fds=0x7ffd1f9a7c08) at /usr/include/bits/poll2.h:46
#2  memcached_io_poll (inst=0x55a618b49778, events=1, prev_errno=0) at /usr/src/debug/libmemcached-03c0a641f8dbd5cc8a0d9ac62f7e8e2b00f45159/src/libmemcached/io.cc:177
#3  0x00007f45c5d99099 in _io_fill (instance=<optimized out>) at /usr/src/debug/libmemcached-03c0a641f8dbd5cc8a0d9ac62f7e8e2b00f45159/src/libmemcached/io.cc:393
#4  memcached_io_read (instance=instance@entry=0x55a618b49778, buffer=buffer@entry=0x7ffd1f9a7ff0, length=length@entry=1, nread=@0x7ffd1f9a7cc0: -1)
    at /usr/src/debug/libmemcached-03c0a641f8dbd5cc8a0d9ac62f7e8e2b00f45159/src/libmemcached/io.cc:464
#5  0x00007f45c5d99ac8 in memcached_io_read (nread=@0x7ffd1f9a7cc0: -1, length=1, buffer=0x7ffd1f9a7ff0, instance=0x55a618b49778)
    at /usr/src/debug/libmemcached-03c0a641f8dbd5cc8a0d9ac62f7e8e2b00f45159/src/libmemcached/io.cc:784
#6  memcached_io_readline (instance=0x55a618b49778, buffer_ptr=0x7ffd1f9a7ff0 "END\r\n 3031399949core_capabilities-d410703b54e63475f5ffbecef3d603bf 84 26831\r\n_get_devicetypeinuse", 
    size=size@entry=350, total_nr=@0x7ffd1f9a7d10: 0) at /usr/src/debug/libmemcached-03c0a641f8dbd5cc8a0d9ac62f7e8e2b00f45159/src/libmemcached/io.cc:784
#7  0x00007f45c5d9bca5 in textual_read_one_response (result=0x7ffd1f9a7d80, buffer_length=350, 
    buffer=0x7ffd1f9a7ff0 "END\r\n 3031399949core_capabilities-d410703b54e63475f5ffbecef3d603bf 84 26831\r\n_get_devicetypeinuse", instance=0x55a618b49778)
    at /usr/src/debug/libmemcached-03c0a641f8dbd5cc8a0d9ac62f7e8e2b00f45159/src/libmemcached/response.cc:761
#8  _read_one_response (instance=instance@entry=0x55a618b49778, 
    buffer=buffer@entry=0x7ffd1f9a7ff0 "END\r\n 3031399949core_capabilities-d410703b54e63475f5ffbecef3d603bf 84 26831\r\n_get_devicetypeinuse", buffer_length=buffer_length@entry=350, 
    result=result@entry=0x7ffd1f9a7d80) at /usr/src/debug/libmemcached-03c0a641f8dbd5cc8a0d9ac62f7e8e2b00f45159/src/libmemcached/response.cc:761
#9  0x00007f45c5d9c994 in memcached_response (instance=0x55a618b49778, 
    buffer=0x7ffd1f9a7ff0 "END\r\n 3031399949core_capabilities-d410703b54e63475f5ffbecef3d603bf 84 26831\r\n_get_devicetypeinuse", buffer_length=350, result=0x55a6184d8490)
    at /usr/src/debug/libmemcached-03c0a641f8dbd5cc8a0d9ac62f7e8e2b00f45159/src/libmemcached/response.cc:821
#10 0x00007f45c5d95eeb in mget_by_key_real (ptr=0x55a6184d83f0, group_key=<optimized out>, group_key_length=<optimized out>, keys=0x7f45b9661b00, key_length=0x7f45b9661af8, 
    number_of_keys=1, mget_mode=true) at /usr/src/debug/libmemcached-03c0a641f8dbd5cc8a0d9ac62f7e8e2b00f45159/src/libmemcached/get.cc:191
#11 0x00007f45c5d96c6b in memcached_mget_by_key (shell=<optimized out>, group_key=<optimized out>, group_key_length=<optimized out>, keys=<optimized out>, key_length=<optimized out>, 
    number_of_keys=<optimized out>) at /usr/src/debug/libmemcached-03c0a641f8dbd5cc8a0d9ac62f7e8e2b00f45159/src/libmemcached/get.cc:300
#12 0x00007f45a8210ed9 in php_memc_mget_apply.part.10 () from /usr/lib64/php/modules/memcached.so
#13 0x00007f45a8212b5f in php_memc_get_impl () from /usr/lib64/php/modules/memcached.so
#14 0x00007f45ba61ba6a in execute_ex () from /etc/httpd/modules/libphp7.so
#15 0x00007f45ba61c8d1 in zend_execute () from /etc/httpd/modules/libphp7.so
#16 0x00007f45ba595623 in zend_execute_scripts () from /etc/httpd/modules/libphp7.so
#17 0x00007f45ba533d80 in php_execute_script () from /etc/httpd/modules/libphp7.so
#18 0x00007f45ba61e670 in php_handler () from /etc/httpd/modules/libphp7.so
#19 0x000055a6170e4f90 in ap_run_handler (r=r@entry=0x55a6184cfd90) at config.c:169
#20 0x000055a6170e54d9 in ap_invoke_handler (r=r@entry=0x55a6184cfd90) at config.c:439
#21 0x000055a6170fa12a in ap_process_async_request (r=r@entry=0x55a6184cfd90) at http_request.c:339
#22 0x000055a6170fa412 in ap_process_request (r=r@entry=0x55a6184cfd90) at http_request.c:374
#23 0x000055a6170f65a2 in ap_process_http_sync_connection (c=0x55a618773930) at http_core.c:190
#24 ap_process_http_connection (c=0x55a618773930) at http_core.c:231
#25 0x000055a6170ee5c0 in ap_run_process_connection (c=c@entry=0x55a618773930) at connection.c:41
#26 0x000055a6170ee9d8 in ap_process_connection (c=c@entry=0x55a618773930, csd=<optimized out>) at connection.c:212
#27 0x00007f45c14697ef in child_main (child_num_arg=child_num_arg@entry=6) at prefork.c:707
#28 0x00007f45c1469a35 in make_child (s=0x55a618348370, slot=6) at prefork.c:810
#29 0x00007f45c146a6ce in perform_idle_server_maintenance (p=<optimized out>) at prefork.c:912
#30 prefork_run (_pconf=<optimized out>, plog=<optimized out>, s=<optimized out>) at prefork.c:1100
#31 0x000055a6170c934e in ap_run_mpm (pconf=pconf@entry=0x55a61831f158, plog=0x55a61834c378, s=0x55a618348370) at mpm_common.c:96
#32 0x000055a6170c20a6 in main (argc=2, argv=0x7ffd1f9aafc8) at main.c:783

And here is where I discovered the ever-decrementing response count:

(gdb) break memcached_response
Breakpoint 8 at 0x7f45c5d9c8d0: memcached_response. (3 locations)
(gdb) step
796       if ((instance->root->flags.no_block) and (memcached_is_processing_input(instance->root) == false))
(gdb) step                                                            
                                                              
Breakpoint 8, memcached_response (instance=0x55a618b49778, buffer=0x7ffd1f9a7ff0 "END\r\n 3031399949core_capabilities-d410703b54e63475f5ffbecef3d603bf 84 26831\r\n_get_devicetypeinuse",
    buffer_length=350, result=0x55a6184d8490) at /usr/src/debug/libmemcached-03c0a641f8dbd5cc8a0d9ac62f7e8e2b00f45159/src/libmemcached/response.cc:789
789     memcached_return_t memcached_response(memcached_instance_st *instance, char *buffer,                                                                                                
(gdb) step                                                                                                                                                                                  
812       if (memcached_is_binary(instance->root) == false                                                                                                                                  
(gdb) step                                                                                                                                                                                  
memcached_server_response_count (self=self@entry=0x55a618b49778) at /usr/src/debug/libmemcached-03c0a641f8dbd5cc8a0d9ac62f7e8e2b00f45159/src/libmemcached/server.cc:181                     
181       if (self == NULL) {                                                                                                                                                               
(gdb) step                                                                                                                                                                                  
185       return self->cursor_active_;                                                                                                                                                      
(gdb) print self->cursor_active_                                                                                                                                                            
$5 = 4294936406                                                                               
(gdb) continue                                                                                                                                                                              
Continuing.                                                                                                                                                                                 
                                                               
Breakpoint 8, memcached_response (instance=instance@entry=0x55a618b49778,                                                                                                                   
    buffer=buffer@entry=0x7ffd1f9a7ff0 "END\r\n 3031399949core_capabilities-d410703b54e63475f5ffbecef3d603bf 84 26831\r\n_get_devicetypeinuse", buffer_length=buffer_length@entry=350,      
    result=result@entry=0x55a6184d8490) at /usr/src/debug/libmemcached-03c0a641f8dbd5cc8a0d9ac62f7e8e2b00f45159/src/libmemcached/response.cc:791
791       if (memcached_is_udp(instance->root)) {                                                                                                                                           
(gdb) step                                                                                                                                                                                  
796       if ((instance->root->flags.no_block) and (memcached_is_processing_input(instance->root) == false))                                                                                
(gdb) step                                                                                                                                                                                  
                                                                                                                                                                                            
Breakpoint 8, memcached_response (instance=0x55a618b49778, buffer=0x7ffd1f9a7ff0 "END\r\n 3031399949core_capabilities-d410703b54e63475f5ffbecef3d603bf 84 26831\r\n_get_devicetypeinuse",   
    buffer_length=350, result=0x55a6184d8490) at /usr/src/debug/libmemcached-03c0a641f8dbd5cc8a0d9ac62f7e8e2b00f45159/src/libmemcached/response.cc:789                                      
789     memcached_return_t memcached_response(memcached_instance_st *instance, char *buffer,
(gdb) step                                                                
812       if (memcached_is_binary(instance->root) == false                  
(gdb) step                                                                          
memcached_server_response_count (self=self@entry=0x55a618b49778) at /usr/src/debug/libmemcached-03c0a641f8dbd5cc8a0d9ac62f7e8e2b00f45159/src/libmemcached/server.cc:181
181       if (self == NULL) {                                              
(gdb) step                                                                         
185       return self->cursor_active_;                                                
(gdb) print self->cursor_active_                                                                                                                                                            
$6 = 4294936405

Due to high-traffic, we will definitely see this issue again, although I have no idea how to re-create the conditions to reproduce it. Happy to help if I can.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions