Skip to content

Agent stalls with 100% cpu utilization #14

@kingcu

Description

@kingcu

I have been struggling for several days with a problem in my agents. Randomly, they will stall and use 100% of the CPU. strace reveals the agents are just context switching and doing nothing:

--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
rt_sigreturn(0) = 40001616
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
rt_sigreturn(0) = 0
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
rt_sigreturn(0) = 0
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
rt_sigreturn(0) = 1
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
rt_sigreturn(0) = 1
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
rt_sigreturn(0) = 0
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
rt_sigreturn(0) = 101
--- SIGVTALRM (Virtual timer expired) @ 0 (0) ---
rt_sigreturn(0) = 0

I have tried everything: modified agents to use epoll rather than select, tried ruby enterprise edition and ruby1.9 (they remove the syscalls in strace, but agents still lock). I cannot discern a pattern or reason the agents lock specifically, meaning the job they lock on isn't consistent ASIDE from happening during a job that utilizes net/http to pull down some images and stitch them together.

I thought it might be an issue with calling sleep() inside the agents, but that didn't solve anything. I really have no idea where to go from here.

Pastie to my agent code: http://pastie.org/702881
Pastie to image fetch/stitch code: http://pastie.org/702895

On the plus side, I'll be able to give you a quick modification to nanite that causes it to use epoll, which dropped my CPU utilization a hair while performing a large amount of jobs! Any ideas on where to start even looking from here would be appreciated, otherwise I am going to just start commenting out code until something changes (the worst way to debug!).

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions