Skip to content

fix(vm): fix partial future resolution panics in mixed gathers#251

Open
runyaga wants to merge 4 commits intopydantic:mainfrom
runyaga:fix/vm-partial-future-resolution
Open

fix(vm): fix partial future resolution panics in mixed gathers#251
runyaga wants to merge 4 commits intopydantic:mainfrom
runyaga:fix/vm-partial-future-resolution

Conversation

@runyaga
Copy link
Copy Markdown

@runyaga runyaga commented Mar 8, 2026

Summary

Fixes two panics in async_exec.rs that occur during incremental resolution of asyncio.gather() when a gather mixes coroutine tasks with direct external calls.

Closes #240

Bug 1: prepare_current_task_after_resolve — "no active frame" panic

After a partial resume where load_ready_task_if_needed saves the current task's context (draining frames), the next call to prepare_current_task_after_resolve still considers the task ready and attempts to push a value onto an empty frame stack. vm.run() then panics with "no active frame".

Fix: Early return false when self.frames.is_empty(), deferring to load_ready_task_if_needed to restore the task context.

Bug 2: resolve_future gather path — premature gather completion

std::mem::take(&mut gather.task_ids) drains the task ID vec before checking whether all tasks have completed. If the gather isn't fully resolved yet, handle_task_completion later reads the empty vec, considers the gather vacuously complete (zero tasks = all done), and panics on unfilled result slots.

Fix: Clone task_ids instead of taking ownership, preserving the gather's internal state for subsequent completion checks.

Test plan

Two new tests reproduce the exact panic conditions:

  • gather_mixed_coroutine_and_direct_external_partial_resolve — mixed coroutine + direct external call with partial resolution (Bug 1)
  • gather_three_tasks_with_direct_external_memtake_corruption — three-way gather exposing the mem::take corruption (Bug 2)
test result: ok. 21 passed; 0 failed (asyncio)
test result: ok. 818 passed; 0 failed (datatest_runner, ref-count-panic)

All existing tests pass across ref-count-panic, ref-count-return, and no-features configurations.

Two bugs in async_exec.rs caused panics when partially resolving
gathers that mix coroutine tasks with direct external calls:

1. prepare_current_task_after_resolve() didn't check frames.is_empty(),
   so it would claim the current task was ready even when its frames had
   been saved to the scheduler during a previous partial resume. The
   subsequent vm.run() panicked on empty frames ("no active frame").

2. resolve_future() used std::mem::take(&mut gather.task_ids) before
   checking completion, emptying the gather's task_ids. If the gather
   was NOT complete, handle_task_completion would later read the empty
   task_ids and consider the gather vacuously complete, panicking on
   unfilled results.

Fix 1: Early return false when frames are empty.
Fix 2: Clone task_ids instead of mem::take.
@codecov
Copy link
Copy Markdown

codecov bot commented Mar 8, 2026

Codecov Report

❌ Patch coverage is 79.06977% with 9 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
crates/monty/src/bytecode/vm/async_exec.rs 78.04% 7 Missing and 2 partials ⚠️

📢 Thoughts on this report? Let us know!

@codspeed-hq
Copy link
Copy Markdown

codspeed-hq bot commented Mar 8, 2026

Merging this PR will not alter performance

✅ 15 untouched benchmarks
⏩ 15 skipped benchmarks1


Comparing runyaga:fix/vm-partial-future-resolution (8b395a7) with main (a8645d8)

Open in CodSpeed

Footnotes

  1. 15 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports.

runyaga added a commit to runyaga/dart_monty that referenced this pull request Mar 9, 2026
Enumerate the 3 patches carried in runyaga/monty fork:
- pydantic/monty#251: asyncio.gather panic fix (submitted upstream)
- runyaga/monty#3: CancellableTracker (merged in fork)
- runyaga/monty#4: cpu:wasm32 npm restriction (open issue)
Copy link
Copy Markdown
Collaborator

@davidhewitt davidhewitt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR, I think maybe opens some bigger questions.

Comment on lines +961 to +966
pub fn prepare_current_task_after_resolve(&mut self) -> bool {
// If frames were drained during a previous partial resume, fall back to
// load_ready_task_if_needed to restore the task context first.
if self.frames.is_empty() {
return false;
}
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This seems like a broader structural error, maybe prepare_current_task_after_resolve and load_ready_task_if_needed should be merged?

Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 potential issue.

View 2 additional findings in Devin Review.

Open in Devin Review

Comment on lines +696 to +698
// Resolve only the gather's direct external call first (call_ids[0] = async_call(999)).
// This triggers mem::take on gather.task_ids, corrupting it to [].
let results = vec![(call_ids[0], ExtFunctionResult::Return(MontyObject::Int(999)))];
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 Test comment describes a previously-fixed bug, not current behavior

The comment at line 697 states "This triggers mem::take on gather.task_ids, corrupting it to []", but in the current codebase, resolve_future (crates/monty/src/bytecode/vm/async_exec.rs:676-743) never calls mem::take on task_ids — it only reads task_ids via .iter().all() at line 692. The mem::take on task_ids only exists in failure paths (handle_task_failure at line 460, fail_future at line 772). The comment appears to describe a previously-existing bug that was already fixed before this PR; the test serves as a regression test. The present-tense wording ("This triggers") could confuse future readers into thinking the corruption still occurs in the resolve path.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Copy link
Copy Markdown

@devin-ai-integration devin-ai-integration bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Devin Review found 1 new potential issue.

View 3 additional findings in Devin Review.

Open in Devin Review

Comment on lines +908 to +911
assert!(
!pending_call_ids.is_empty(),
"resume_with_resolved_futures called but no pending calls and no ready tasks"
);
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🚩 New assert is stricter than old fallthrough-to-run behavior

The old code had a path where !main_task_ready && !loaded_task && pending_call_ids.is_empty() would silently fall through to vm.run(). The new code at async_exec.rs:908-911 replaces this with assert!(!pending_call_ids.is_empty()), which panics instead. This is actually an improvement — reaching that state means the scheduler is inconsistent (a blocked task with no pending calls and no ready tasks). The old fallthrough to vm.run() in this state would likely produce undefined behavior. However, if any edge case can legitimately reach this state (e.g., all futures resolved but a task is still marked BlockedOnGather due to incomplete gather logic), this would become a runtime panic in production rather than a graceful error.

Open in Devin Review

Was this helpful? React with 👍 or 👎 to provide feedback.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FutureSnapshot triggers Rust Panic, PR Available

2 participants