Skip to content

fix(supervisor): prevent file descriptor leaks in SSE streaming and IPC#267

Merged
jdx merged 19 commits into
jdx:mainfrom
benjaminwestern:main
Mar 8, 2026
Merged

fix(supervisor): prevent file descriptor leaks in SSE streaming and IPC#267
jdx merged 19 commits into
jdx:mainfrom
benjaminwestern:main

Conversation

@benjaminwestern

Copy link
Copy Markdown
Contributor

Fix 'too many open files' error caused by FD leaks:

  1. SSE log streaming (logs.rs): Reuse file handle between iterations instead of reopening every 500ms. Prevents FD exhaustion from rapid file open/close cycles in tight loops.

  2. IPC connection tasks (server.rs): Add proper cleanup on send failures to prevent task accumulation from dead connections.

These changes should significantly reduce FD usage during long-running supervisor sessions with active web UI log streaming.

Fix 'too many open files' error caused by FD leaks:

1. SSE log streaming (logs.rs): Reuse file handle between iterations
   instead of reopening every 500ms. Prevents FD exhaustion from rapid
   file open/close cycles in tight loops.

2. IPC connection tasks (server.rs): Add proper cleanup on send failures
   to prevent task accumulation from dead connections.

These changes should significantly reduce FD usage during long-running
supervisor sessions with active web UI log streaming.

Co-authored-by: Kimi K2.5 <kimi-k2.5@opencode.ai>
@gemini-code-assist

Copy link
Copy Markdown

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request significantly enhances the stability and resource management of the supervisor by addressing two key areas prone to file descriptor leaks. By optimizing SSE log streaming to reuse file handles and implementing robust cleanup for IPC connection tasks, the changes prevent 'too many open files' errors and ensure more efficient resource utilization during extended operation.

Highlights

  • SSE Log Streaming Optimization: The SSE log streaming mechanism has been refactored to reuse a single file handle for reading log files. This prevents excessive file open/close operations, addressing potential file descriptor exhaustion during long-running sessions with active log streaming.
  • IPC Connection Cleanup: IPC connection tasks now include proper cleanup logic. When message sending fails, the tasks terminate cleanly, preventing the accumulation of tasks from dead connections and reducing the risk of file descriptor leaks.
Changelog
  • src/ipc/server.rs
    • Ensured IPC connection handler tasks terminate cleanly on send failures by adding break statements.
    • Added trace logging to indicate clean termination of IPC read and send tasks.
  • src/web/routes/logs.rs
    • Imported Seek and SeekFrom traits for file positioning.
    • Refactored the stream_sse function to maintain and reuse a std::fs::File handle across iterations.
    • Implemented logic to re-open the log file only if no handle exists or if the file metadata cannot be retrieved (suggesting recreation or deletion).
    • Updated file truncation handling to reset the file handle and seek to the beginning.
Activity
  • No human activity has been recorded on this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request effectively addresses potential file descriptor leaks by ensuring spawned tasks in src/ipc/server.rs terminate on IPC channel breaks and by refactoring SSE log streaming in src/web/routes/logs.rs to reuse a single file handle. These changes improve resource management and stability. However, a medium-severity security issue was identified in the log streaming logic where unbounded memory allocation could lead to a denial of service if log files grow excessively fast, which requires attention.

Comment thread src/web/routes/logs.rs Outdated
Comment thread src/web/routes/logs.rs Outdated
Address security and code quality concerns from code review:

1. Security fix: Limit log reads to 1MB per iteration to prevent memory
   exhaustion from rapidly-growing log files. Uses bounded read_exact
   instead of unbounded read_to_end.

2. Code style: Replace unwrap() with Option::insert() for cleaner,
   more idiomatic Rust when managing the file handle.

Both changes improve robustness of the SSE log streaming endpoint.
@benjaminwestern

Copy link
Copy Markdown
Contributor Author

@greptileai

@greptile-apps

greptile-apps Bot commented Mar 4, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR addresses EMFILE ("too many open files") errors in the supervisor by:

  1. SSE log streaming (src/web/routes/logs.rs): Reuses a persistent std::fs::File handle across polling iterations (every 500ms) instead of reopening the file repeatedly. This directly addresses FD exhaustion from rapid open/close cycles. The streaming loop now offloads all blocking I/O into tokio::task::spawn_blocking, adds Unix inode-based rotation detection (with #[cfg(unix)]), and handles truncation, seek failures, and read failures with proper handle resets.

  2. IPC connection tasks (src/ipc/server.rs): Adds break statements on channel-send failures to tear down dead-connection tasks promptly, preventing task accumulation.

  3. Test coverage (tests/test_e2e_logs.rs): Adds new SSE E2E tests with RAII cleanup (ChildGuard), port-0 binding to avoid TOCTOU races, and a Unix-only rotation test.

  4. Minor improvements: src/web/server.rs now reports the actual bound address in startup logs (useful for port-0 assignments).

The core FD-leak fix is well-structured and directly addresses the reported EMFILE errors. Changes are sound and backward-compatible.

Confidence Score: 4/5

  • PR is safe to merge — core FD-leak fix is sound, IPC cleanup is straightforward, and test coverage is improved with proper RAII patterns.
  • No verified issues were found during review. The changes directly address the reported EMFILE errors through file handle reuse and spawn_blocking offloading, which are appropriate techniques. IPC task accumulation is fixed with break statements on errors. MSRV is correctly declared at 1.87 (sufficient for all language features used). New tests include proper cleanup guards and platform-specific guards. The main substantive changes (persistent file handles, async I/O offloading, rotation detection) follow sound patterns. Score reflects high confidence in correctness while accounting for the complexity of the SSE streaming logic.
  • No files require special attention.

Last reviewed commit: d19e09e

Comment thread src/web/routes/logs.rs Outdated
@greptile-apps

greptile-apps Bot commented Mar 4, 2026

Copy link
Copy Markdown
Contributor
Additional Comments (1)

src/ipc/server.rs
Self::send silently swallows write_all errors and always returns Ok(()).

The helper function logs write failures at trace! level but does not propagate them:

if let Err(err) = send.write_all(&msg).await {
    trace!("Failed to send message: {err:?}");
}
Ok(())   // ← error never propagated

As a result, the break on line 235 in send_messages_chan is unreachable for the most common failure case (client disconnection during write):

if let Err(err) = Self::send(&mut send, msg).await {
    warn!("Failed to send message: {err:?}");
    break;  // ← can only trigger on serialize() errors, not write failures
}

The task will continue looping and attempting writes to a dead connection. To make the cleanup effective, propagate the write error:

async fn send(send: &mut SendHalf, msg: IpcResponse) -> Result<()> {
    let mut msg = serialize(&msg)?;
    if msg.contains(&0) {
        bail!("IPC message contains null byte");
    }
    msg.push(0);
    send.write_all(&msg).await.into_diagnostic()?;
    Ok(())
}

@benjaminwestern

Copy link
Copy Markdown
Contributor Author

@greptileai

Comment thread src/web/routes/logs.rs Outdated
Comment thread src/web/routes/logs.rs Outdated
@benjaminwestern

Copy link
Copy Markdown
Contributor Author

@greptileai

Comment thread src/web/routes/logs.rs Outdated
@benjaminwestern

Copy link
Copy Markdown
Contributor Author

@greptileai

Comment thread src/web/routes/logs.rs Outdated
Comment thread src/web/routes/logs.rs Outdated
Comment thread src/ipc/server.rs
@benjaminwestern

Copy link
Copy Markdown
Contributor Author

@greptileai

Comment thread src/ipc/server.rs Outdated
@benjaminwestern

Copy link
Copy Markdown
Contributor Author

@greptileai

Comment thread src/web/routes/logs.rs Outdated
Comment thread src/web/routes/logs.rs Outdated
@benjaminwestern

Copy link
Copy Markdown
Contributor Author

@greptileai

Comment thread src/web/routes/logs.rs Outdated
Comment thread src/web/routes/logs.rs Outdated
@benjaminwestern

Copy link
Copy Markdown
Contributor Author

@greptileai

Comment thread src/web/routes/logs.rs Outdated
Comment thread src/web/routes/logs.rs Outdated
Comment thread src/web/routes/logs.rs Outdated
@benjaminwestern

Copy link
Copy Markdown
Contributor Author

@greptileai

@benjaminwestern

Copy link
Copy Markdown
Contributor Author

@greptileai

Comment thread src/web/routes/logs.rs Outdated
@benjaminwestern

Copy link
Copy Markdown
Contributor Author

@greptileai

Comment thread src/web/routes/logs.rs Outdated
Comment thread src/web/routes/logs.rs Outdated
@benjaminwestern

Copy link
Copy Markdown
Contributor Author

@greptileai

Comment thread src/web/routes/logs.rs
Comment thread src/web/routes/logs.rs Outdated
@benjaminwestern

Copy link
Copy Markdown
Contributor Author

@greptileai

Comment thread src/web/routes/logs.rs
Comment thread src/web/routes/logs.rs Outdated
@benjaminwestern

Copy link
Copy Markdown
Contributor Author

@greptileai

Comment thread src/web/routes/logs.rs Outdated
Comment thread src/web/routes/logs.rs Outdated
@benjaminwestern

Copy link
Copy Markdown
Contributor Author

I don't know enough about file system SSE architecture to confidently continue with this whack-a-mole game but I would like some feedback @jdx if you are comfortable sharing?

@benjaminwestern benjaminwestern marked this pull request as ready for review March 7, 2026 22:10

@jdx jdx left a comment

Copy link
Copy Markdown
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A few suggestions on the logs.rs changes:

1. last_size should initialize from file metadata

Initializing last_size = 0 means the first iteration will stream up to 1MB of existing log content on every new SSE connection. The original code initialized from the current file size so it only streamed new content. This is a behavior regression for large log files.

let mut last_size = std::fs::metadata(&log_path).map(|m| m.len()).unwrap_or(0);

2. Manual read loop → take().read_to_end()

The manual read loop reimplements what Read::take already does:

let bytes_to_read = (current_size - ls).min(MAX_READ_SIZE);
let mut buffer = Vec::new();
match file.take(bytes_to_read).read_to_end(&mut buffer) {
    Ok(0) | Err(_) => return (None, ls, Some(FileOpResult::ReadFailed), current_ino),
    Ok(n) => {
        ls += n as u64;
        return (Some(file), ls, Some(FileOpResult::Data(buffer)), current_ino);
    }
}

Note: take() consumes the File by value, but you can work around this by calling it on &mut file (i.e. (&mut file).take(bytes_to_read).read_to_end(&mut buffer)), since Read is implemented for &File.

3. Use a struct instead of a 4-tuple for spawn_blocking result

The return type (Option<File>, u64, Option<FileOpResult>, Option<u64>) is hard to follow. A struct would make the code much more readable:

struct FileOpOutput {
    file: Option<std::fs::File>,
    size: u64,
    result: Option<FileOpResult>,
    inode: Option<u64>,
}

This comment was generated by Claude Code.

Initialize SSE streams from the current EOF so the web log viewer only receives new data while still recovering cleanly from truncation and rotation. Add e2e coverage for SSE connect and rotation behavior; leave the pre-existing duplicated initial log rendering paths unchanged in this change.
Comment thread tests/test_e2e_logs.rs Outdated
Comment thread tests/test_e2e_logs.rs Outdated
Comment thread tests/test_e2e_logs.rs Outdated
Comment thread src/web/routes/logs.rs
Stabilize the SSE end-to-end tests by tolerating chunk delays, synchronizing rotation setup with the live stream, and discovering the actual web port chosen by the supervisor. Also skip the fresh-open inode check when reusing an existing file handle so steady-state polling avoids unnecessary metadata work.
Comment thread tests/test_e2e_logs.rs Outdated
Comment thread src/web/routes/logs.rs
Comment thread tests/test_e2e_logs.rs Outdated
Keep SSE web tests from leaking supervisors on panic and discover the actual port from supervisor startup logs when using dynamic binding. Reduce path-based inode polling during steady-state streams so the held-handle path does less redundant metadata work.
Comment thread src/web/routes/logs.rs Outdated
Comment thread tests/test_e2e_logs.rs
Comment thread src/web/routes/logs.rs
Comment thread src/ipc/server.rs
Join the SSE test stderr reader thread during supervisor cleanup so helper threads do not outlive the child process. Also skip the path-based inode stat immediately after a fresh open since the file metadata is already available from that open path.
Comment thread tests/test_e2e_logs.rs
Comment thread src/web/routes/logs.rs
Comment thread tests/test_e2e_logs.rs
Gate the rotation-specific SSE regression test to Unix where inode-based rotation detection is implemented, clarify the web-start readiness helper's return path, and declare Rust 1.87 so the SSE polling code's standard-library APIs are part of the repo's documented toolchain.
@benjaminwestern

Copy link
Copy Markdown
Contributor Author

@jdx I think we are all done now? Please let me know if you disagree with anything I have addressed (after arguing with the ol AI).

@jdx jdx merged commit c016a7d into jdx:main Mar 8, 2026
5 checks passed
@jdx jdx mentioned this pull request Mar 7, 2026
jdx added a commit that referenced this pull request Mar 8, 2026
## 🤖 New release

* `pitchfork-cli`: 2.0.0 -> 2.1.0

<details><summary><i><b>Changelog</b></i></summary><p>

<blockquote>

## [2.1.0](v2.0.0...v2.1.0) -
2026-03-08

### Added

- add `settings.toml`
([#275](#275))

### Fixed

- correct json schema for DaemonId
([#277](#277))
- *(supervisor)* prevent file descriptor leaks in SSE streaming and IPC
([#267](#267))
- fixed scroll disabled when log <20 lines
([#268](#268))

### Other

- Support .config/pitchfork.toml
([#265](#265))
- *(README)* update broken link
([#270](#270))
</blockquote>


</p></details>

---
This PR was generated with
[release-plz](https://github.com/release-plz/release-plz/).

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> Release bookkeeping only (version/changelog/lockfile) with no
functional code changes in this PR.
> 
> **Overview**
> Bumps `pitchfork-cli` from `2.0.0` to `2.1.0` in `Cargo.toml` and
`Cargo.lock`.
> 
> Updates `CHANGELOG.md` with the new `2.1.0` release entry (noting the
new `settings.toml`, several fixes, and minor documentation/config
support updates).
> 
> <sup>Written by [Cursor
Bugbot](https://cursor.com/dashboard?tab=bugbot) for commit
c5f40ab. This will update automatically
on new commits. Configure
[here](https://cursor.com/dashboard?tab=bugbot).</sup>
<!-- /CURSOR_SUMMARY -->
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants