Skip to content

Fix multi-repo RPC deadlock via stderr redirect and dedicated GetObject pool#7036

Merged
jkschneider merged 1 commit intomainfrom
fix/rpc-stderr-redirect-and-getobject-pool
Mar 18, 2026
Merged

Fix multi-repo RPC deadlock via stderr redirect and dedicated GetObject pool#7036
jkschneider merged 1 commit intomainfrom
fix/rpc-stderr-redirect-and-getobject-pool

Conversation

@jkschneider
Copy link
Copy Markdown
Member

Summary

  • Redirect subprocess stderr at the OS level via ProcessBuilder.redirectError() to prevent pipe-buffer deadlock on macOS (64KB kernel buffer). When a log path is configured, stderr goes to that file; otherwise it goes to /dev/null.
  • Use a dedicated thread pool for GetObject tree traversal instead of ForkJoinPool.commonPool(), which gets saturated by repo-level fork-join tasks and starves GetObject producers.
  • Wire stderrRedirect through C#, JavaScript, and Python RPC builders.

These two deadlocks were independent — the stderr issue blocked the subprocess from writing RPC responses, while the pool starvation prevented Java from servicing GetObject callbacks during BatchVisit.

Test plan

  • Ran mod run with UpgradeToDotNet10 on 11 C# repos in parallel — completed in 6m 25s with no deadlock
  • Previously deadlocked after 1-2 repos with the serializing semaphore, and immediately with parallel execution

…ct pool

Two independent deadlocks prevented `mod run` from completing across
multiple C# repositories:

1. **Stderr pipe buffer deadlock**: On macOS the kernel pipe buffer for
   stderr is 64KB. When the dotnet subprocess writes enough to stderr to
   fill this buffer, it blocks on the next write(), which deadlocks any
   in-progress RPC response on stdout. Fix: redirect stderr at the OS
   level via ProcessBuilder.redirectError() — to the log file when
   configured, or to /dev/null otherwise. This eliminates the pipe
   entirely so the subprocess never blocks on stderr.

2. **GetObject thread pool starvation**: GetObject.Handler submitted
   background tree traversal tasks to ForkJoinPool.commonPool(), which
   is the same pool used by the CLI for repo-level fork-join work. When
   repo tasks saturated the pool (blocked on semaphores or waiting for
   RPC responses), GetObject producers couldn't start, deadlocking the
   batch protocol. Fix: use a dedicated cached thread pool for tree
   traversal so GetObject producers are never starved by unrelated work.
@github-project-automation github-project-automation Bot moved this to In Progress in OpenRewrite Mar 18, 2026
@jkschneider jkschneider merged commit 18ceb3e into main Mar 18, 2026
1 check failed
@jkschneider jkschneider deleted the fix/rpc-stderr-redirect-and-getobject-pool branch March 18, 2026 22:43
@github-project-automation github-project-automation Bot moved this from In Progress to Done in OpenRewrite Mar 18, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

Archived in project

Development

Successfully merging this pull request may close these issues.

1 participant