Improve Ruby Reaper performance under heavy load

**Is your feature request related to a problem? Please describe.**
I find the ruby reaper runs slowly, and takes up a lot of memory when I'm processing a large volume of jobs (hundreds of thousands in a few hours). I've updated our configuration so that if the reaper fails entirely, the reaper resurrector brings it back, and it *mostly* doesn't hit the timeout, but ideally it would run a little bit faster with a smaller footprint.

**Describe the solution you'd like**
I suggest that the reaper should work through the oldest digests first, and that it should avoid loading all digests in ruby at once. Here's the code I'm interested in:

https://github.com/mhenrixon/sidekiq-unique-jobs/blob/323d4efbc21f5d68144726dcce6de10aa12971c6/lib/sidekiq_unique_jobs/orphans/ruby_reaper.rb#L58-L63

Currently, using `zrevrange` means we go from the highest score to the lowest. As the current timestamp is generally used for a digest, this means going from newest to oldest. It's certainly not perfect, but I suggest a better general guess when seeking stale digests would be to go from oldest to newest - we can do this by using `zrange` instead of `zrevrange`.

Second, perhaps more laboriously, I suggest paging through digests rather than loading the whole set. It might look similar to this:
```ruby
        page = 0
        per = reaper_count * 2
        orphans = []
        digests = conn.zrange(digests.key, page * per, (page + 1) * per)
        while(digests.size > 0)
          digests.each do |digest|
            next if belongs_to_job?(digest)
  
            orphans << digest
            break if orphans.size >= reaper_count
          end

          break if orphans.size >= reaper_count
          page +=1
          digests = conn.zrange(digests.key, page * per, (page + 1) * per)
       end
       orphans
```

**Describe alternatives you've considered**
I've considered switching to the Lua reaper, but I was concerned about blocking redis. I'm also thinking about changing some of our application logic so we don't lean quite so heavily on unique jobs, but that will take a bit longer to develop.

**Additional context**
I'm happy to provide more detail on how we're using sidekiq-unique-jobs in case that's helpful. We tend to process large volumes of jobs (e.g., 300,000) in a short amount of time (e.g., 2 hours) and then have long periods with much less activity.


	conn.zrevrange(digests.key, 0, -1).each_with_object([]) do \|digest, memo\|
	next if belongs_to_job?(digest)

	memo << digest
	break memo if memo.size >= reaper_count
	end

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve Ruby Reaper performance under heavy load #663

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

Improve Ruby Reaper performance under heavy load #663

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions