Skip to content

fix: prevent jobId collisions on workflow step retries#13786

Merged
kodiakhq[bot] merged 9 commits intodevelopfrom
fix/retry-workflows
Oct 21, 2025
Merged

fix: prevent jobId collisions on workflow step retries#13786
kodiakhq[bot] merged 9 commits intodevelopfrom
fix/retry-workflows

Conversation

@srindom
Copy link
Copy Markdown
Collaborator

@srindom srindom commented Oct 20, 2025

Summary

What — What changes are introduced in this PR?

This PR fixes a bug where async workflow steps with retry intervals would get stuck after the first retry attempt due to Bull queue jobId collisions preventing retry jobs from executing.

Why — Why are these changes relevant or necessary?

Workflows using async steps with retry configurations (e.g., retryInterval: 1, maxRetries: 5) would fail once, schedule a retry, but the retry job would never execute, causing workflows to hang indefinitely.

How — How have these changes been implemented?

Root Cause: Bull queue was rejecting retry jobs because they had identical jobIds to the async execution jobs that already completed. Both used the format: retry:workflow:transaction:step_id:attempts.

Solution: Modified getJobId() in workflow-orchestrator-storage.ts to append a :retry suffix when interval > 0, creating unique jobIds:

  • Async execution (interval=0): retry:...:step_id:1
  • Retry scheduling (interval>0): retry:...:step_id:1:retry

Updated methods: getJobId(), scheduleRetry(), removeJob(), and clearRetry() to pass and handle the interval parameter.

Testing — How have these changes been tested, or how can the reviewer test the feature?

Added integration test retry-interval.spec.ts that verifies:

  1. Step with retryInterval: 1 and maxRetries: 3 executes 3 times
  2. Retry intervals are approximately 1 second between attempts
  3. Workflow completes successfully after retries
  4. Uses proper async workflow completion pattern with subscribe() and onFinish event

Examples

// Example workflow step that would previously get stuck
export const testRetryStep = createStep(
  {
    name: "test-retry-step",
    async: true,
    retryInterval: 1, // 1 second retry interval
    maxRetries: 3,
  },
  async (input: any) => {
    // Simulate failure on first 2 attempts
    if (attempts < 3) {
      throw new Error("Temporary failure - will retry")
    }
    return { success: true }
  }
)

// Before fix: Step would fail once, schedule retry, but retry job never fired (jobId collision)
// After fix: Step properly retries up to 3 times with 1-second intervals

Checklist

Please ensure the following before requesting a review:

  • I have added a changeset for this PR
    • Every non-breaking change should be marked as a patch
    • To add a changeset, run yarn changeset and follow the prompts
  • The changes are covered by relevant tests
  • I have verified the code works as intended locally
  • I have linked the related issue(s) if applicable

Additional Context

@srindom srindom requested a review from a team as a code owner October 20, 2025 19:36
@changeset-bot
Copy link
Copy Markdown

changeset-bot bot commented Oct 20, 2025

🦋 Changeset detected

Latest commit: f8ce557

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 74 packages
Name Type
@medusajs/workflow-engine-redis Patch
@medusajs/orchestration Patch
@medusajs/medusa Patch
@medusajs/framework Patch
@medusajs/modules-sdk Patch
@medusajs/workflows-sdk Patch
@medusajs/test-utils Patch
@medusajs/medusa-oas-cli Patch
integration-tests-http Patch
@medusajs/analytics Patch
@medusajs/api-key Patch
@medusajs/auth Patch
@medusajs/cache-inmemory Patch
@medusajs/cache-redis Patch
@medusajs/caching Patch
@medusajs/cart Patch
@medusajs/currency Patch
@medusajs/customer Patch
@medusajs/event-bus-local Patch
@medusajs/event-bus-redis Patch
@medusajs/file Patch
@medusajs/fulfillment Patch
@medusajs/index Patch
@medusajs/inventory Patch
@medusajs/link-modules Patch
@medusajs/locking Patch
@medusajs/notification Patch
@medusajs/order Patch
@medusajs/payment Patch
@medusajs/pricing Patch
@medusajs/product Patch
@medusajs/promotion Patch
@medusajs/region Patch
@medusajs/sales-channel Patch
@medusajs/settings Patch
@medusajs/stock-location Patch
@medusajs/store Patch
@medusajs/tax Patch
@medusajs/user Patch
@medusajs/workflow-engine-inmemory Patch
@medusajs/analytics-local Patch
@medusajs/analytics-posthog Patch
@medusajs/auth-emailpass Patch
@medusajs/auth-github Patch
@medusajs/auth-google Patch
@medusajs/caching-redis Patch
@medusajs/file-local Patch
@medusajs/file-s3 Patch
@medusajs/fulfillment-manual Patch
@medusajs/locking-postgres Patch
@medusajs/locking-redis Patch
@medusajs/notification-local Patch
@medusajs/notification-sendgrid Patch
@medusajs/payment-stripe Patch
@medusajs/draft-order Patch
@medusajs/core-flows Patch
@medusajs/oas-github-ci Patch
@medusajs/js-sdk Patch
@medusajs/types Patch
@medusajs/utils Patch
@medusajs/cli Patch
@medusajs/deps Patch
@medusajs/telemetry Patch
@medusajs/admin-bundler Patch
@medusajs/admin-sdk Patch
@medusajs/admin-shared Patch
@medusajs/admin-vite-plugin Patch
@medusajs/dashboard Patch
@medusajs/icons Patch
@medusajs/toolbox Patch
@medusajs/ui-preset Patch
create-medusa-app Patch
medusa-dev-cli Patch
@medusajs/ui Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@vercel
Copy link
Copy Markdown

vercel bot commented Oct 21, 2025

The latest updates on your projects. Learn more about Vercel for GitHub.

7 Skipped Deployments
Project Deployment Preview Comments Updated (UTC)
api-reference-v2 Ignored Ignored Preview Oct 21, 2025 10:03am
cloud-docs Ignored Ignored Preview Oct 21, 2025 10:03am
docs-ui Ignored Ignored Preview Oct 21, 2025 10:03am
docs-v2 Ignored Ignored Preview Oct 21, 2025 10:03am
medusa-docs Ignored Ignored Preview Oct 21, 2025 10:03am
resources-docs Ignored Ignored Preview Oct 21, 2025 10:03am
user-guide Ignored Ignored Preview Oct 21, 2025 10:03am

💡 Enable Vercel Agent with $100 free credit for automated AI reviews

cursor[bot]

This comment was marked as outdated.

@srindom
Copy link
Copy Markdown
Collaborator Author

srindom commented Oct 21, 2025

/snapshot-this

@srindom
Copy link
Copy Markdown
Collaborator Author

srindom commented Oct 21, 2025

/snapshot-this

@github-actions
Copy link
Copy Markdown
Contributor

🚀 A snapshot release has been made for this PR

Test the snapshots by updating your package.json with the newly published versions:

yarn add @medusajs/admin-bundler@2.11.1-snapshot-20251021090705
yarn add @medusajs/admin-sdk@2.11.1-snapshot-20251021090705
yarn add @medusajs/admin-shared@2.11.1-snapshot-20251021090705
yarn add @medusajs/admin-vite-plugin@2.11.1-snapshot-20251021090705
yarn add @medusajs/dashboard@2.11.1-snapshot-20251021090705
yarn add create-medusa-app@2.11.1-snapshot-20251021090705
yarn add @medusajs/cli@2.11.1-snapshot-20251021090705
yarn add medusa-dev-cli@2.11.1-snapshot-20251021090705
yarn add @medusajs/medusa-oas-cli@2.11.1-snapshot-20251021090705
yarn add @medusajs/core-flows@2.11.1-snapshot-20251021090705
yarn add @medusajs/framework@2.11.1-snapshot-20251021090705
yarn add @medusajs/js-sdk@2.11.1-snapshot-20251021090705
yarn add @medusajs/modules-sdk@2.11.1-snapshot-20251021090705
yarn add @medusajs/orchestration@2.11.1-snapshot-20251021090705
yarn add @medusajs/types@2.11.1-snapshot-20251021090705
yarn add @medusajs/utils@2.11.1-snapshot-20251021090705
yarn add @medusajs/workflows-sdk@2.11.1-snapshot-20251021090705
yarn add @medusajs/deps@2.11.1-snapshot-20251021090705
yarn add @medusajs/icons@2.11.1-snapshot-20251021090705
yarn add @medusajs/ui@4.0.25-snapshot-20251021090705
yarn add @medusajs/ui-preset@2.11.1-snapshot-20251021090705
yarn add @medusajs/medusa@2.11.1-snapshot-20251021090705
yarn add @medusajs/telemetry@2.11.1-snapshot-20251021090705
yarn add @medusajs/test-utils@2.11.1-snapshot-20251021090705
yarn add @medusajs/analytics@2.11.1-snapshot-20251021090705
yarn add @medusajs/api-key@2.11.1-snapshot-20251021090705
yarn add @medusajs/auth@2.11.1-snapshot-20251021090705
yarn add @medusajs/cache-inmemory@2.11.1-snapshot-20251021090705
yarn add @medusajs/cache-redis@2.11.1-snapshot-20251021090705
yarn add @medusajs/caching@2.11.1-snapshot-20251021090705
yarn add @medusajs/cart@2.11.1-snapshot-20251021090705
yarn add @medusajs/currency@2.11.1-snapshot-20251021090705
yarn add @medusajs/customer@2.11.1-snapshot-20251021090705
yarn add @medusajs/event-bus-local@2.11.1-snapshot-20251021090705
yarn add @medusajs/event-bus-redis@2.11.1-snapshot-20251021090705
yarn add @medusajs/file@2.11.1-snapshot-20251021090705
yarn add @medusajs/fulfillment@2.11.1-snapshot-20251021090705
yarn add @medusajs/index@2.11.1-snapshot-20251021090705
yarn add @medusajs/inventory@2.11.1-snapshot-20251021090705
yarn add @medusajs/link-modules@2.11.1-snapshot-20251021090705
yarn add @medusajs/locking@2.11.1-snapshot-20251021090705
yarn add @medusajs/notification@2.11.1-snapshot-20251021090705
yarn add @medusajs/order@2.11.1-snapshot-20251021090705
yarn add @medusajs/payment@2.11.1-snapshot-20251021090705
yarn add @medusajs/pricing@2.11.1-snapshot-20251021090705
yarn add @medusajs/product@2.11.1-snapshot-20251021090705
yarn add @medusajs/promotion@2.11.1-snapshot-20251021090705
yarn add @medusajs/analytics-local@2.11.1-snapshot-20251021090705
yarn add @medusajs/analytics-posthog@2.11.1-snapshot-20251021090705
yarn add @medusajs/auth-emailpass@2.11.1-snapshot-20251021090705
yarn add @medusajs/auth-github@2.11.1-snapshot-20251021090705
yarn add @medusajs/auth-google@2.11.1-snapshot-20251021090705
yarn add @medusajs/caching-redis@2.11.1-snapshot-20251021090705
yarn add @medusajs/file-local@2.11.1-snapshot-20251021090705
yarn add @medusajs/file-s3@2.11.1-snapshot-20251021090705
yarn add @medusajs/fulfillment-manual@2.11.1-snapshot-20251021090705
yarn add @medusajs/locking-postgres@2.11.1-snapshot-20251021090705
yarn add @medusajs/locking-redis@2.11.1-snapshot-20251021090705
yarn add @medusajs/notification-local@2.11.1-snapshot-20251021090705
yarn add @medusajs/notification-sendgrid@2.11.1-snapshot-20251021090705
yarn add @medusajs/payment-stripe@2.11.1-snapshot-20251021090705
yarn add @medusajs/region@2.11.1-snapshot-20251021090705
yarn add @medusajs/sales-channel@2.11.1-snapshot-20251021090705
yarn add @medusajs/settings@2.11.1-snapshot-20251021090705
yarn add @medusajs/stock-location@2.11.1-snapshot-20251021090705
yarn add @medusajs/store@2.11.1-snapshot-20251021090705
yarn add @medusajs/tax@2.11.1-snapshot-20251021090705
yarn add @medusajs/user@2.11.1-snapshot-20251021090705
yarn add @medusajs/workflow-engine-inmemory@2.11.1-snapshot-20251021090705
yarn add @medusajs/workflow-engine-redis@2.11.1-snapshot-20251021090705
yarn add @medusajs/draft-order@2.11.1-snapshot-20251021090705

Latest commit: 1d63ed8

@srindom srindom changed the title fix: invariant for workflow step retry fix: prevent jobId collisions on workflow step retries Oct 21, 2025
@kodiakhq kodiakhq bot merged commit bad0858 into develop Oct 21, 2025
30 checks passed
@kodiakhq kodiakhq bot deleted the fix/retry-workflows branch October 21, 2025 18:27
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants