Skip to content

Add parallel iteration scaffold#702

Merged
Mangara merged 2 commits into
mainfrom
mangara-parallel-2
Apr 30, 2026
Merged

Add parallel iteration scaffold#702
Mangara merged 2 commits into
mainfrom
mangara-parallel-2

Conversation

@Mangara

@Mangara Mangara commented Apr 28, 2026

Copy link
Copy Markdown
Contributor

What and why?

It's sometimes useful to iterate over a collection in parallel to speed up the completion of the work. This PR adds a primitive to enable that:

def build_enumerator(cursor:)
  enumerator_builder.parallel(instances: 7, cursor: cursor) do |instance, total_instances, inner_cursor|
    enumerator_builder.active_record_on_records(
      Product.where("id % ? = ?", total_instances, instance),
      cursor: inner_cursor,
    )
  end
end

When this job is enqueued, it first runs with a nil cursor, causing it to enqueue the 5 child jobs, each with a cursor that looks like { instance: x, inner_cursor: nil }, where x ranges from 0 to 4. When those jobs start, they build the inner enumerator and run as normal, except that the outer enumerator ensures that the cursor stays wrapped in this hash with instance and inner_cursor.

sequenceDiagram
      participant Parent as ParentJob
      participant Q as ActiveJob queue
      participant Child as ChildJob (instance i)

      activate Parent
      Parent->>Parent: build_enumerator(cursor: nil)
      Note over Parent: cursor is nil →<br/>user block NOT invoked
      Parent->>Parent: enqueue_jobs(self.class, arguments)
      Parent->>Q: perform_all_later([job_0..job_N-1])
      Note right of Parent: each child gets<br/>cursor_position =<br/>{instance: i, inner_cursor: nil}
      deactivate Parent

      Q->>Child: perform<br/>(cursor: {instance: i, inner_cursor: nil})
      activate Child
      Child->>Child: build_enumerator(cursor: {instance: i, ...})
      Note over Child: cursor non-nil →<br/>user block IS invoked,<br/>builds inner enum for instance i

      loop until done or interrupted
          Child->>Child: each_iteration(record)
          Note right of Child: cursor_position =<br/>{instance: i, inner_cursor: X}
      end

      alt interrupted
          Child->>Q: retry_job (carries current cursor)
          Q->>Child: perform<br/>(cursor: {instance: i, inner_cursor: X})
          Note over Child: cursor still non-nil →<br/>no re-fan-out,<br/>resumes from inner_cursor X
      else completes
          Note over Child: on_complete callback fires
      end
      deactivate Child
Loading

Gotchas

  • Callback behaviour could be unexpected, as the parent job runs no callbacks, and each child job runs the full set. So on_start means this particular instance is starting and on_complete means this particular instance is done iterating. There is no "all child jobs are done iterating" callback, as that would require some form of external synchronization.

Follow-up

  • I plan to add parallel versions of most existing enumerators that divvy up the work for you, so you can simply call enumerator_builder.parallel_active_record_on_records(Product.all, instances: 5, cursor: cursor)
  • This needs documentation

@Mangara Mangara force-pushed the mangara-parallel-2 branch from 67aaac0 to f893a0e Compare April 29, 2026 13:33
@Mangara Mangara marked this pull request as ready for review April 29, 2026 13:33
@Mangara Mangara requested a review from a team as a code owner April 29, 2026 13:33
Comment thread lib/job-iteration/iteration.rb Outdated
Comment thread lib/job-iteration/parallel_enumerator.rb
Comment thread lib/job-iteration/parallel_enumerator.rb Outdated

@CelinaAssal CelinaAssal left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

@adrianna-chang-shopify adrianna-chang-shopify left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Couple small questions but looks good overall!

Comment thread lib/job-iteration/iteration.rb Outdated
Comment thread lib/job-iteration/parallel_enumerator.rb Outdated

unless child_jobs.all?(&:successfully_enqueued?)
failed_count = @instances - child_jobs.count(&:successfully_enqueued?)
raise EnqueueError, "Failed to enqueue #{failed_count} out of #{@instances} child jobs"

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmm, does this mean that if any of the child jobs fail, we re-enqueue all of the children again on retry? I guess we assume idempotency and it's not really an issue?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah. This is a tricky case where I don't see any perfect options.

I suspect that usually either all the enqueues will fail, or all will succeed. If we do have a partial success, retrying the job will indeed enqueue additional copies.

@Mangara Mangara force-pushed the mangara-parallel-2 branch from bda69e3 to 4af4177 Compare April 30, 2026 15:12
@Mangara Mangara merged commit 9141f77 into main Apr 30, 2026
24 checks passed
@Mangara Mangara deleted the mangara-parallel-2 branch April 30, 2026 17:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants