Agent stalls with 100% cpu utilization

I have been struggling for several days with a problem in my agents.  Randomly, they will stall and use 100% of the CPU.  strace reveals the agents are just context switching and doing nothing:

--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
rt_sigreturn(0)                         = 40001616
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
rt_sigreturn(0)                         = 0
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
rt_sigreturn(0)                         = 0
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
rt_sigreturn(0)                         = 1
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
rt_sigreturn(0)                         = 1
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
rt_sigreturn(0)                         = 0
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
rt_sigreturn(0)                         = 101
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
rt_sigreturn(0)                         = 0

I have tried everything: modified agents to use epoll rather than select, tried ruby enterprise edition and ruby1.9 (they remove the syscalls in strace, but agents still lock).  I cannot discern a pattern or reason the agents lock specifically, meaning the job they lock on isn't consistent ASIDE from happening during a job that utilizes net/http to pull down some images and stitch them together.

I thought it might be an issue with calling sleep() inside the agents, but that didn't solve anything.  I really have no idea where to go from here.

Pastie to my agent code: http://pastie.org/702881
Pastie to image fetch/stitch code: http://pastie.org/702895

On the plus side, I'll be able to give you a quick modification to nanite that causes it to use epoll, which dropped my CPU utilization a hair while performing a large amount of jobs!  Any ideas on where to start even looking from here would be appreciated, otherwise I am going to just start commenting out code until something changes (the worst way to debug!).


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Agent stalls with 100% cpu utilization #14

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Agent stalls with 100% cpu utilization #14

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions