I am seeing weird behavior in production where sidekiq:sidekiq_unique are not always removed after completing a job.
I am running an hourly import job, that queues over 1000 jobs to fetch and process data from an API. To prevent multiple workers processing the same job, I am using sidekiq-unique-jobs with a unique_job_expiration of 1.day.
When I run this on my development machine (OS X), everything is fine. When running in production (Linux, the uniqueness keys are not always removed. This causes import jobs not the run for a whole day.
Normally (and what I see in on my development machine) is that the number of sidekiq:sidekiq_unique keys is equal to the number of currently running jobs plus the queue size. When I running the same import on production, I see over 120 sidekiq:sidekiq_unique keys not being unlocked.
My first thought was that this is caused by some worker jobs, queueing other worker jobs. But I could also reproduce this in production by performing the same worker multiple times.
At this moment I don't have any clue what the cause of this is. But maybe someone has the same issue or is able to provide debugging instructions.
I am seeing weird behavior in production where sidekiq:sidekiq_unique are not always removed after completing a job.
I am running an hourly import job, that queues over 1000 jobs to fetch and process data from an API. To prevent multiple workers processing the same job, I am using sidekiq-unique-jobs with a unique_job_expiration of 1.day.
When I run this on my development machine (OS X), everything is fine. When running in production (Linux, the uniqueness keys are not always removed. This causes import jobs not the run for a whole day.
Normally (and what I see in on my development machine) is that the number of sidekiq:sidekiq_unique keys is equal to the number of currently running jobs plus the queue size. When I running the same import on production, I see over 120 sidekiq:sidekiq_unique keys not being unlocked.
My first thought was that this is caused by some worker jobs, queueing other worker jobs. But I could also reproduce this in production by performing the same worker multiple times.
At this moment I don't have any clue what the cause of this is. But maybe someone has the same issue or is able to provide debugging instructions.