Skip to content

[24.2] Fix various job concurrency limit issues#19824

Merged
natefoo merged 6 commits intogalaxyproject:release_24.2from
mvdbeek:fix_limit_bypass
Mar 24, 2025
Merged

[24.2] Fix various job concurrency limit issues#19824
natefoo merged 6 commits intogalaxyproject:release_24.2from
mvdbeek:fix_limit_bypass

Conversation

@mvdbeek
Copy link
Copy Markdown
Member

@mvdbeek mvdbeek commented Mar 17, 2025

I've added an additional check in job_wrapper.enqueue that only updates jobs below the limit. This should be multi-process / multi-thread safe.
The queries are essentially the same queries that are done in JobHandler.__check_user_jobs, JobHandler.__check_destination_jobs etc, but now it's all in in a single update statement.

I suppose performance might be a concern, however we still run through the (cached) checks before we decide to queue the job, so I think the cost is likely minimal. By integrating the limit check in the query i think it should become very unlikely that jobs can bypass limits in a multi handler scenario.

c088f9c fixes a bug where a resubmitted job would cause the cached user_job_count_per_destination / user_job_count values to start at 0.

How to test the changes?

(Select all options that apply)

  • I've included appropriate automated tests.
  • This is a refactoring of components with existing test coverage.
  • Instructions for manual testing are as follows:
    1. [add testing steps and prerequisites here if you didn't write automated tests covering all your changes]

License

  • I agree to license these and all my past contributions to the core galaxy codebase under the MIT license.

These are essentially the same queries that are done in
`JobHandler.__check_user_jobs`, `JobHandler.__check_destination_jobs`
etc, but now it's all in in a single update statement.

I suppose performance might be a concern, however we still run through
the (cached) checks before we decide to queue the job, so I think
the cost is likely minimal. By integrating the limit check in the query
i think it should become very unlikely that jobs can bypass limits in a
multi handler scenario.
@mvdbeek mvdbeek requested a review from natefoo March 17, 2025 16:54
@mvdbeek mvdbeek changed the title [24.2] Guard state update with limit queries [24.2] Fix various job concurrency limit issues Mar 18, 2025
@mvdbeek mvdbeek marked this pull request as ready for review March 18, 2025 13:25
@github-actions github-actions Bot added this to the 25.0 milestone Mar 18, 2025
@mvdbeek
Copy link
Copy Markdown
Member Author

mvdbeek commented Mar 18, 2025

whoa, all tests ran and are green! that's been a while

@mvdbeek mvdbeek requested a review from a team March 18, 2025 14:54
@mvdbeek
Copy link
Copy Markdown
Member Author

mvdbeek commented Mar 24, 2025

This is on main now, job loop times seem unaffected, which is good. let's merge this ?

@natefoo natefoo merged commit ecc4b47 into galaxyproject:release_24.2 Mar 24, 2025
@nsoranzo nsoranzo deleted the fix_limit_bypass branch March 24, 2025 23:23
@galaxyproject galaxyproject deleted a comment from github-actions Bot Mar 25, 2025
mvdbeek added a commit to mvdbeek/galaxy that referenced this pull request Apr 10, 2025
Broken in galaxyproject#19824.
We used to add to the job state history in `Job.set_state`.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants