[rqd] Add windows support#2165
[rqd] Add windows support#2165DiegoTavares merged 27 commits intoAcademySoftwareFoundation:masterfrom
Conversation
First take at implementing windows support for rqd. The goal was to have feature parity, but I wasn't able to find a Windows tool to emulate taskset and allocate processes to threads. So this stands as the only feature not available for windows.
|
Caveat: I currently don't have a windows environment to test this on. I'm relying on the Rust unit tests on github as my only quality assurance. I need a volunteer to test this on a real environment and report bugs. |
…for interface binding Provide Windows-specific get_ip_for_interface implementation that returns an error if binding by interface name is requested, since this feature is unsupported on Windows. Add corresponding tests.
…ion tests - Adjust test code to handle Windows binary naming and commands - Load machine resource files conditionally for Unix-based tests - Add Windows PowerShell memory_fork.cmd test script - Mark frame execution and related tests as Unix-only where appropriate - Add comment on run_as_user config option availability on Unix systems only
Use a function to wait for openrqd to open its gRPC port before proceeding, ensuring tests do not start too early and improving reliability and speed of the integration tests.
…indows - Refactor integration tests to normalize paths for Windows compatibility - Correct Rust RQD docs to reflect Windows support status as available
Use ephemeral free ports for gRPC servers instead of fixed ports to avoid conflicts. Serialize integration tests with a global mutex to prevent parallel execution issues. Replace fixed hostnames with "127.0.0.1" for consistency.
|
@lithorus or @KernAttila do you guys have a Windows machine on hand to give the rust/rqd a spin and report back if you find any errors? |
I'm affraid I don't have one. |
|
@DiegoTavares I'm testing on Windows right now. |
|
@KernAttila Any updates on the Windows tests? |
Also filter out signal dependend functions not really available on Windows. @author: Kern Attila GERMAIN <5556461+KernAttila@users.noreply.github.com>
Signed-off-by: Diego Tavares <dtavares@imageworks.com>
|
@KernAttila Your commits have been added. Thanks for the contribution. |
67ef63c to
31f596b
Compare
Adds x86_64-pc-windows-msvc as a build target to both the packaging and release CI pipelines, with appropriate Windows-specific dependency installation, binary copying, and release asset upload steps
31f596b to
cb21cdb
Compare
|
HI @DiegoTavares , thanks for your contributions! I am reviewing this! We need a Windows machine to test these new changes! |
We won't get a windows machine to test this on time for the next release. Kern worked on testing this and suggested some changes with his findings that have been applied. This PR contains a windows version of the unit tests for rqd and they have been passing successfully. With both the unit tests and Kerns tests I think we're good to go for an initial release, adding a caveat to the release notes that the windows variant hasn't been tested on production. |
|
Note Reviews pausedIt looks like this branch is under active development. To avoid overwhelming you with review comments due to an influx of new commits, CodeRabbit has automatically paused this review. You can configure this behavior by changing the Use the following commands to manage reviews:
Use the checkboxes below for quick actions:
📝 WalkthroughWalkthroughAdds comprehensive Windows support across CI/release workflows, build/test jobs, a Windows System backend, Windows-aware frame execution and logging, OS-conditional dependencies, and cross-platform test and integration adjustments. Changes
Sequence DiagramsequenceDiagram
participant User
participant RunningFrame as RunningFrame
participant Builder as FrameCmdBuilder
participant Entrypoint as Entrypoint\n(.bat/.sh)
participant Process as Child\nProcess
participant Logger as Logger
User->>RunningFrame: run_inner(logger)
RunningFrame->>Builder: new(shell, entrypoint_path)
Builder->>Builder: prepare command string / entrypoint
Builder->>Entrypoint: write generated script (.bat / .sh)
Builder-->>RunningFrame: return Command + script_path
RunningFrame->>RunningFrame: setup_raw_file/stdout_stderr
RunningFrame->>Process: spawn(command)
RunningFrame->>Logger: start log piping
Process-->>RunningFrame: exit_status
RunningFrame->>RunningFrame: interprete_output(exit_status)
RunningFrame->>Logger: log completion
RunningFrame-->>User: return (exit_code, exit_signal)
Estimated code review effort🎯 4 (Complex) | ⏱️ ~75 minutes
🚥 Pre-merge checks | ✅ 3✅ Passed checks (3 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out. Comment |
There was a problem hiding this comment.
Actionable comments posted: 10
🧹 Nitpick comments (3)
.github/workflows/release-pipeline.yml (1)
133-135: Consider settingPROTOCenvironment variable for consistency.Same observation as in
packaging-pipeline.yml- the protoc installation doesn't explicitly set thePROTOCenvironment variable, unlikerust-pipeline.yml.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/release-pipeline.yml around lines 133 - 135, The Windows installation step "Install dependencies (Windows)" installs protoc via choco but doesn't set the PROTOC environment variable; update that step to export/set PROTOC to the installed protoc binary path (e.g., the choco install location) so downstream steps can reference $PROTOC (or %PROTOC%) consistently; change the "Install dependencies (Windows)" step that currently runs "choco install protoc -y" to also determine the installed protoc path and set the PROTOC environment variable (mirror how PROTOC is set in rust-pipeline.yml) so other steps use the canonical PROTOC symbol..github/workflows/packaging-pipeline.yml (1)
96-98: Consider settingPROTOCenvironment variable for consistency.The
rust-pipeline.ymlexplicitly setsPROTOC=C:\ProgramData\chocolatey\bin\protoc.exeafter installing protoc. This step only installs protoc but doesn't set the environment variable. While it may work if protoc is in PATH, explicitly settingPROTOCwould ensure consistency across pipelines.- name: Install dependencies (Windows) if: matrix.os == 'windows-2022' run: choco install protoc -y + + - name: Export PROTOC path (Windows) + if: matrix.os == 'windows-2022' + run: echo "PROTOC=C:\ProgramData\chocolatey\bin\protoc.exe" >> $env:GITHUB_ENV🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In @.github/workflows/packaging-pipeline.yml around lines 96 - 98, The Windows install step "Install dependencies (Windows)" only runs choco install protoc; update that step to also set the PROTOC environment variable explicitly to the installed binary (e.g., C:\ProgramData\chocolatey\bin\protoc.exe) so downstream steps see a consistent value. In the "Install dependencies (Windows)" job/step (the step name string), either add an env entry PROTOC: 'C:\ProgramData\chocolatey\bin\protoc.exe' or append the path to GITHUB_ENV (echoing PROTOC into the environment) immediately after installing protoc.rust/crates/rqd/tests/rqd_integration_tests.rs (1)
250-265: Free-port probing still leaves a bind race.These helpers release the socket before
dummy-cuebotoropenrqdbind to it, so another process can reclaim the port and make the suite flaky on busy runners. If binding to0is not possible for the test binaries, consider retrying the spawn/allocation path when startup fails with an address-in-use error.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rust/crates/rqd/tests/rqd_integration_tests.rs` around lines 250 - 265, The current get_free_port/get_two_free_ports helpers create a race by releasing ephemeral ports before test binaries bind; instead implement a robust retry-on-bind-failure strategy around the process spawn/allocation path (where dummy-cuebot or openrqd are started) rather than pre-probing ports: remove or stop relying on get_free_port/get_two_free_ports and change the test startup code that launches these processes to retry spawning (with backoff and a small max attempts) when the child fails to start with an address-in-use error (EADDRINUSE), e.g., wrap the spawn logic used for dummy-cuebot/openrqd in a loop that catches the bind error and retries. Ensure the retry loop logs attempts and eventually fails the test if the maximum retries are exhausted.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@docs/_docs/reference/rust-rqd.md`:
- Around line 324-328: Update the "Multi-platform support: Linux, MacOS and
Windows" bullet in the "Core Functionality" section to indicate Windows is
initial/experimental (e.g., "Multi-platform support: Linux, macOS and
initial/experimental Windows support") and add a brief caveats sub-bullet or
sentence referencing the current limitations: missing CPU-affinity binding and
lack of production validation on real Windows hosts, and that the remaining docs
(artifacts and service setup) currently only cover Linux/macOS; ensure the
change targets the "Core Functionality" heading and the existing bullet text so
readers see the softened claim and explicit caveats.
In `@rust/crates/opencue-proto/build.rs`:
- Around line 20-32: The build script resolves the actual protos directory into
the variable protos_dir (the PathBuf computed in the block assigning protos_dir)
but only emits cargo:rerun-if-changed for the original literal path; add an
additional cargo:rerun-if-changed directive for the resolved protos_dir
(converted to a string) after it is computed so Cargo watches the real directory
for additions/deletions (ensure you emit this before returning/using protos_dir
and use the same string form you would pass to
println!("cargo:rerun-if-changed={}", ...)).
In `@rust/crates/rqd/resources/test_scripts/memory_fork.cmd`:
- Around line 1-15: The batch script memory_fork.cmd uses caret continuations
and must be committed with CRLF line endings to avoid brittle behavior on
Windows; update the file to use CRLF endings (ensure your editor/git config
converts this file to CRLF) or add a .gitattributes entry (e.g., matching *.cmd)
that enforces text eol=crlf so memory_fork.cmd is always normalized on commit;
verify the file now contains CRLF line endings and re-run CI to confirm the
caret continuations work on Windows.
In `@rust/crates/rqd/src/frame/frame_cmd.rs`:
- Around line 163-188: The Windows build() implementation in frame_cmd.rs
ignores the configured shell and always replaces the process with
Command::new(&self.entrypoint_file_path); update build() (or related caller) to
honor self.shell: if self.shell is Some(shell_path) then create the Command
using that interpreter (e.g., Command::new(shell_path) with appropriate args to
run the generated batch/script file) and preserve end_cmd/script generation,
otherwise if the configured shell is present but not cmd.exe reject with an Err
explaining unsupported shell on Windows; ensure you set self.cmd to the new
Command and return (&mut self.cmd, script.clone()) so
runner.shell_path/self.shell is actually used.
- Around line 259-263: The Windows stub of FrameCmd::with_exit_file drops the
provided path which prevents recoveries from persisting exit codes; change the
Windows implementation in frame_cmd.rs so with_exit_file(&mut self,
exit_file_path: String) sets self.exit_file_path = Some(exit_file_path) and
returns self (matching the non-Windows implementation) so calls from
running_frame.rs (with_exit_file(...) and subsequent read_exit_file()) can find
the exit file during recovery; keep the existing #[cfg(target_os = "windows")]
but ensure the field name and types match the struct's exit_file_path.
In `@rust/crates/rqd/src/frame/running_frame.rs`:
- Around line 699-723: The Windows branch of run_inner currently ignores
self.config.run_as_user and will silently run frames as the rqd account; add a
guard in run_inner that checks self.config.run_as_user on Windows (cfg!(windows)
or target-specific branch) and returns an error (fail fast) if run_as_user is
enabled, so we don't proceed to build/spawn the command; update the error
message to reference the frame ID or entrypoint (use self.request or
self.entrypoint_file_path) and ensure the check is performed before building or
spawning the process (place it near the start of run_inner, before
FrameCmdBuilder::build() / cmd.spawn()).
In `@rust/crates/rqd/src/system/windows.rs`:
- Around line 456-458: kill_session_internal currently walks session_processes
(via collect_lineage) without refreshing the cached process list, so newly
spawned children can be missed; fix this by calling refresh_procs() (or the
method that repopulates session_processes) before collecting the
lineage/iterating processes in kill_session_internal (ensure any error from
refresh_procs() is handled or propagated), referencing kill_session_internal,
collect_lineage, and session_processes to locate where to insert the refresh.
- Around line 533-541: The function create_user_if_unexisting currently logs a
warning but returns Ok(uid) when the requested username differs from the current
Windows user; change it to return an Err instead so callers (e.g., the frame
launch path in frame/manager.rs around the code that checks this result) fail
fast. Specifically, in create_user_if_unexisting(&self, username: &str, uid:
u32, _gid: u32) -> Result<u32>, detect when current_user != username and return
a suitable error (with a clear message like "user switching not supported on
Windows: requested '<username>'") instead of Ok(uid); update any error
construction to use the crate's existing error type so the caller's existing
Result handling works. Ensure no other code treats the previous Ok(uid) case as
success without handling the new Err.
- Around line 629-635: The reboot function currently discards the ExitStatus and
treats any completed shutdown invocation as success; update the
Command::new("shutdown").args(["/r", "/t", "0"]).status() handling in reboot to
check the returned ExitStatus via status.success() and return an Err (using
into_diagnostic()/wrap_err or similar) when it is false so non-zero exit codes
(e.g., access denied) are propagated as failures instead of being mapped to
Ok(()). Ensure you still convert command errors to diagnostics and keep the
existing wrap_err message context ("Failed to reboot").
- Around line 221-228: The returned value currently uses SocketAddr::to_string()
which includes the port; update the branch guarded by use_ip_as_hostname to
return only the IP by replacing the final Ok(addr.to_string()) with
Ok(addr.ip().to_string()); keep the existing resolution logic (format!(...):443,
to_socket_addrs, addrs_iter, addr) and error handling unchanged so downstream
callers (e.g., jobhost/Docker hostname) receive a plain IP string.
---
Nitpick comments:
In @.github/workflows/packaging-pipeline.yml:
- Around line 96-98: The Windows install step "Install dependencies (Windows)"
only runs choco install protoc; update that step to also set the PROTOC
environment variable explicitly to the installed binary (e.g.,
C:\ProgramData\chocolatey\bin\protoc.exe) so downstream steps see a consistent
value. In the "Install dependencies (Windows)" job/step (the step name string),
either add an env entry PROTOC: 'C:\ProgramData\chocolatey\bin\protoc.exe' or
append the path to GITHUB_ENV (echoing PROTOC into the environment) immediately
after installing protoc.
In @.github/workflows/release-pipeline.yml:
- Around line 133-135: The Windows installation step "Install dependencies
(Windows)" installs protoc via choco but doesn't set the PROTOC environment
variable; update that step to export/set PROTOC to the installed protoc binary
path (e.g., the choco install location) so downstream steps can reference
$PROTOC (or %PROTOC%) consistently; change the "Install dependencies (Windows)"
step that currently runs "choco install protoc -y" to also determine the
installed protoc path and set the PROTOC environment variable (mirror how PROTOC
is set in rust-pipeline.yml) so other steps use the canonical PROTOC symbol.
In `@rust/crates/rqd/tests/rqd_integration_tests.rs`:
- Around line 250-265: The current get_free_port/get_two_free_ports helpers
create a race by releasing ephemeral ports before test binaries bind; instead
implement a robust retry-on-bind-failure strategy around the process
spawn/allocation path (where dummy-cuebot or openrqd are started) rather than
pre-probing ports: remove or stop relying on get_free_port/get_two_free_ports
and change the test startup code that launches these processes to retry spawning
(with backoff and a small max attempts) when the child fails to start with an
address-in-use error (EADDRINUSE), e.g., wrap the spawn logic used for
dummy-cuebot/openrqd in a loop that catches the bind error and retries. Ensure
the retry loop logs attempts and eventually fails the test if the maximum
retries are exhausted.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e56f0811-4147-4f9f-a33a-3d81221712fc
📒 Files selected for processing (20)
.github/workflows/packaging-pipeline.yml.github/workflows/release-pipeline.yml.github/workflows/rust-pipeline.ymldocs/_docs/reference/rust-rqd.mdrust/config/rqd.yamlrust/crates/dummy-cuebot/Cargo.tomlrust/crates/dummy-cuebot/src/main.rsrust/crates/opencue-proto/build.rsrust/crates/rqd/Cargo.tomlrust/crates/rqd/resources/test_scripts/memory_fork.cmdrust/crates/rqd/src/config/mod.rsrust/crates/rqd/src/frame/frame_cmd.rsrust/crates/rqd/src/frame/logging.rsrust/crates/rqd/src/frame/running_frame.rsrust/crates/rqd/src/servant/mod.rsrust/crates/rqd/src/system/machine.rsrust/crates/rqd/src/system/mod.rsrust/crates/rqd/src/system/windows.rsrust/crates/rqd/tests/rqd_integration_tests.rsrust/crates/scheduler/src/main.rs
There was a problem hiding this comment.
Actionable comments posted: 2
♻️ Duplicate comments (4)
rust/crates/rqd/src/system/windows.rs (4)
227-234:⚠️ Potential issue | 🟠 MajorUse
addr.ip().to_string()to return only the IP address.When
use_ip_as_hostnameis true,SocketAddr::to_string()returns the socket address including the port (e.g.,192.168.1.1:443). Downstream consumers expect a plain hostname/IP without a port.🐛 Proposed fix
let addr = addrs_iter .next() .ok_or_else(|| miette::miette!("Failed to find IP for {}", hostname))?; - Ok(addr.to_string()) + Ok(addr.ip().to_string())🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rust/crates/rqd/src/system/windows.rs` around lines 227 - 234, When use_ip_as_hostname is true in the Windows host-resolution block, the code currently returns addr.to_string() (which includes the port); change it to return only the IP by calling addr.ip().to_string() instead. Locate the variables addrs_iter and addr in the block that constructs format!("{}:443", hostname) and replace the final Ok(addr.to_string()) with Ok(addr.ip().to_string()) so downstream consumers receive a plain IP/hostname without the port.
460-465:⚠️ Potential issue | 🟠 MajorRefresh the process cache before collecting lineage.
kill_session_internalwalkssession_processesviacollect_lineagewithout refreshing the cache first. Children spawned since the lastrefresh_procs()call will be invisible and survive the kill.🐛 Proposed fix
fn kill_session_internal(&self, session_pid: u32) -> Result<()> { + self.refresh_procs_cache(); let lineage = self.collect_lineage(session_pid);🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rust/crates/rqd/src/system/windows.rs` around lines 460 - 465, kill_session_internal calls collect_lineage on session_pid using the cached session_processes but never refreshes the cache first, so newly spawned children can be missed; before calling collect_lineage in kill_session_internal, call refresh_procs() (or the appropriate cache-refresh method) and handle any errors it returns, then proceed to collect_lineage(session_pid) and the rest of the kill logic so the lineage reflects the latest process tree.
537-546:⚠️ Potential issue | 🟠 MajorFail fast when user switching is requested but unsupported.
When
usernamediffers from the current Windows user, this logs a warning but still returnsOk(uid). The frame manager treats this as success and launches the frame under the wrong account. Additionally, Windows usernames are case-insensitive, so the comparison should useeq_ignore_ascii_case.🐛 Proposed fix
fn create_user_if_unexisting(&self, username: &str, uid: u32, _gid: u32) -> Result<u32> { let current_user = std::env::var("USERNAME").unwrap_or_default(); - if !current_user.is_empty() && current_user != username { - warn!( - "User creation is not supported on Windows. Requested user '{}' will be ignored.", - username - ); + if !current_user.is_empty() && !current_user.eq_ignore_ascii_case(username) { + return Err(miette!( + "User switching is not supported on Windows. Requested user '{}' differs from current user '{}'.", + username, + current_user + )); } Ok(uid) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rust/crates/rqd/src/system/windows.rs` around lines 537 - 546, The create_user_if_unexisting function currently warns but returns Ok(uid) when username differs, causing the process to run under the wrong account; change the logic to compare usernames using case-insensitive comparison (username.eq_ignore_ascii_case(¤t_user)) and if they differ return an Err(Result<u32> error) indicating user switching is unsupported on Windows (instead of Ok(uid)); also replace or accompany the warn! with an error-level message that includes both the requested username and the current user so callers can fail fast.
633-640:⚠️ Potential issue | 🟠 MajorCheck the
shutdowncommand exit status.The current code discards the
ExitStatus, treating all completed invocations as success—including access-denied or policy failures. Usestatus.success()to distinguish successful from failed shutdown attempts.🐛 Proposed fix
fn reboot(&self) -> Result<()> { - Command::new("shutdown") + let status = Command::new("shutdown") .args(["/r", "/t", "0"]) .status() - .map(|_| ()) .into_diagnostic() - .wrap_err("Failed to reboot") + .wrap_err("Failed to execute shutdown command")?; + + if status.success() { + Ok(()) + } else { + Err(miette!("shutdown command failed with exit code: {}", status)) + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rust/crates/rqd/src/system/windows.rs` around lines 633 - 640, The reboot function currently ignores the Command::status ExitStatus; update the reboot method (function reboot in windows.rs) to call Command::new("shutdown").args(["/r", "/t", "0"]).status(), check the returned ExitStatus with .success(), and return Ok(()) only when success() is true; otherwise convert the failure into an Err with a diagnostic (e.g., include the ExitStatus or its code in the error) so access-denied or policy failures are surfaced instead of treated as success.
🧹 Nitpick comments (1)
rust/crates/rqd/src/system/windows.rs (1)
23-23: Remove unusedDashSetimport.
DashSetis imported but never used in this file (it was likely left behind aftermonitored_sessionswas removed). This will trigger a compiler warning.🔧 Suggested fix
-use dashmap::{DashMap, DashSet}; +use dashmap::DashMap;🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rust/crates/rqd/src/system/windows.rs` at line 23, The file imports DashSet but never uses it; remove the unused symbol from the import by changing the use statement that currently references DashMap and DashSet so it only imports DashMap (i.e., update the line containing use dashmap::{DashMap, DashSet}; to drop DashSet), ensuring no other references to DashSet (e.g., monitored_sessions) remain in functions or types like in windows.rs.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@rust/crates/rqd/src/system/windows.rs`:
- Around line 560-562: Fix the typo in the debug log string in windows.rs:
change "Collect frame stats fo {}. rss: {}kb virtual: {}kb gpu: {}kb" to use
"for" instead of "fo" in the debug! call that references pid and session_data
(rss, virtual_memory, gpu_memory) so the log reads "Collect frame stats for
...".
- Around line 438-458: The reduce fallback sets the min start to u64::MAX when
all lineage entries were filtered out, producing a bogus start_time; after the
.reduce(...).unwrap_or((...)) step inspect the resulting min-start value (the
fourth tuple element) and if it equals u64::MAX treat it as “no live processes”
and return None instead of constructing a SessionData; update the caller/return
type accordingly (where SessionData is built) so collect_lineage/this function
returns Option<SessionData> when no live children remain and ensure any code
using start_time handles the Option.
---
Duplicate comments:
In `@rust/crates/rqd/src/system/windows.rs`:
- Around line 227-234: When use_ip_as_hostname is true in the Windows
host-resolution block, the code currently returns addr.to_string() (which
includes the port); change it to return only the IP by calling
addr.ip().to_string() instead. Locate the variables addrs_iter and addr in the
block that constructs format!("{}:443", hostname) and replace the final
Ok(addr.to_string()) with Ok(addr.ip().to_string()) so downstream consumers
receive a plain IP/hostname without the port.
- Around line 460-465: kill_session_internal calls collect_lineage on
session_pid using the cached session_processes but never refreshes the cache
first, so newly spawned children can be missed; before calling collect_lineage
in kill_session_internal, call refresh_procs() (or the appropriate cache-refresh
method) and handle any errors it returns, then proceed to
collect_lineage(session_pid) and the rest of the kill logic so the lineage
reflects the latest process tree.
- Around line 537-546: The create_user_if_unexisting function currently warns
but returns Ok(uid) when username differs, causing the process to run under the
wrong account; change the logic to compare usernames using case-insensitive
comparison (username.eq_ignore_ascii_case(¤t_user)) and if they differ
return an Err(Result<u32> error) indicating user switching is unsupported on
Windows (instead of Ok(uid)); also replace or accompany the warn! with an
error-level message that includes both the requested username and the current
user so callers can fail fast.
- Around line 633-640: The reboot function currently ignores the Command::status
ExitStatus; update the reboot method (function reboot in windows.rs) to call
Command::new("shutdown").args(["/r", "/t", "0"]).status(), check the returned
ExitStatus with .success(), and return Ok(()) only when success() is true;
otherwise convert the failure into an Err with a diagnostic (e.g., include the
ExitStatus or its code in the error) so access-denied or policy failures are
surfaced instead of treated as success.
---
Nitpick comments:
In `@rust/crates/rqd/src/system/windows.rs`:
- Line 23: The file imports DashSet but never uses it; remove the unused symbol
from the import by changing the use statement that currently references DashMap
and DashSet so it only imports DashMap (i.e., update the line containing use
dashmap::{DashMap, DashSet}; to drop DashSet), ensuring no other references to
DashSet (e.g., monitored_sessions) remain in functions or types like in
windows.rs.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 4bbe139e-84cb-4e73-bdfd-6ac62df89bae
📒 Files selected for processing (4)
.github/workflows/rust-pipeline.ymlrust/crates/rqd/src/system/macos.rsrust/crates/rqd/src/system/windows.rsrust/crates/rqd/tests/rqd_integration_tests.rs
💤 Files with no reviewable changes (1)
- rust/crates/rqd/src/system/macos.rs
🚧 Files skipped from review as they are similar to previous changes (2)
- rust/crates/rqd/tests/rqd_integration_tests.rs
- .github/workflows/rust-pipeline.yml
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Diego Tavares <dtavares@imageworks.com>
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
rust/crates/rqd/src/frame/frame_cmd.rs (1)
64-75: Windowsnew()is identical to Unix version — consider consolidating.The Windows implementation of
new()is byte-for-byte identical to the#[cfg(any(target_os = "linux", target_os = "macos"))]version (lines 51-62). This duplication increases maintenance burden.♻️ Suggested consolidation
Remove the platform-specific
#[cfg]attributes and keep a single implementation:- #[cfg(any(target_os = "linux", target_os = "macos"))] pub fn new(shell: &String, entrypoint_file_path: String) -> Self { let cmd = Command::new(shell); Self { cmd, shell: shell.clone(), exit_file_path: None, become_user: None, entrypoint_file_path, end_cmd: None, } } - - #[cfg(target_os = "windows")] - pub fn new(shell: &String, entrypoint_file_path: String) -> Self { - let cmd = Command::new(shell); - Self { - cmd, - shell: shell.clone(), - exit_file_path: None, - become_user: None, - entrypoint_file_path, - end_cmd: None, - } - }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rust/crates/rqd/src/frame/frame_cmd.rs` around lines 64 - 75, The Windows-specific new() implementation is identical to the Unix one (function new in frame_cmd.rs), so remove the duplicated cfg-guarded block and keep a single platform-agnostic implementation: delete the #[cfg(target_os = "windows")] new(...) and the surrounding cfg attribute, or alternatively remove both #[cfg(...)] attributes and retain one canonical pub fn new(shell: &String, entrypoint_file_path: String) -> Self implementation; ensure the retained new() initializes cmd, shell, exit_file_path, become_user, entrypoint_file_path, and end_cmd exactly as in the duplicated versions.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@rust/crates/rqd/src/frame/frame_cmd.rs`:
- Around line 187-191: The batch string currently interpolates
self.exit_file_path and cmd_str directly into script (the local variable script)
which lets Windows batch metacharacters (especially %) break execution; sanitize
before formatting: create sanitized_exit = exit_file_path.replace("%","%%") and
sanitized_cmd = cmd_str.replace("%","%%") (to preserve literal % in batch), then
wrap the exit path in double quotes when used in the redirection/echo (use
"sanitized_exit") to protect spaces/special chars; finally use these sanitized
variables in the format! that builds script (update the code that assigns script
where self.exit_file_path and cmd_str are referenced).
---
Nitpick comments:
In `@rust/crates/rqd/src/frame/frame_cmd.rs`:
- Around line 64-75: The Windows-specific new() implementation is identical to
the Unix one (function new in frame_cmd.rs), so remove the duplicated
cfg-guarded block and keep a single platform-agnostic implementation: delete the
#[cfg(target_os = "windows")] new(...) and the surrounding cfg attribute, or
alternatively remove both #[cfg(...)] attributes and retain one canonical pub fn
new(shell: &String, entrypoint_file_path: String) -> Self implementation; ensure
the retained new() initializes cmd, shell, exit_file_path, become_user,
entrypoint_file_path, and end_cmd exactly as in the duplicated versions.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: 29954fd6-19b6-412f-a7e2-e7b2ca89964b
📒 Files selected for processing (2)
rust/crates/rqd/src/frame/frame_cmd.rsrust/crates/rqd/src/frame/running_frame.rs
Apply fix from AcademySoftwareFoundation@bfc8a96 on the windows counterpart Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Diego Tavares <dtavares@imageworks.com>
Co-authored-by: coderabbitai[bot] <136622811+coderabbitai[bot]@users.noreply.github.com> Signed-off-by: Diego Tavares <dtavares@imageworks.com>
There was a problem hiding this comment.
Actionable comments posted: 3
♻️ Duplicate comments (2)
rust/crates/rqd/src/system/windows.rs (2)
634-640:⚠️ Potential issue | 🟠 MajorCheck the
shutdownexit status before returningOk(()).
.status()succeeding here only means the command launched..map(|_| ())turns access-denied or policy failures intoOk(()), so callers can think a reboot was accepted when Windows rejected it.Rust std::process::Command::status ExitStatus::success behavior for non-zero exit codesSuggested fix
fn reboot(&self) -> Result<()> { - Command::new("shutdown") + let status = Command::new("shutdown") .args(["/r", "/t", "0"]) .status() - .map(|_| ()) .into_diagnostic() - .wrap_err("Failed to reboot") + .wrap_err("Failed to reboot")?; + + if status.success() { + Ok(()) + } else { + Err(miette!("shutdown exited unsuccessfully: {}", status)) + } }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rust/crates/rqd/src/system/windows.rs` around lines 634 - 640, The reboot function currently treats any successful spawn of Command::new("shutdown") as success; instead call .status(), inspect the returned ExitStatus with ExitStatus::success(), and only return Ok(()) when success is true; if not, convert the non-success status into an Err (using into_diagnostic() / wrap_err with the status or code included) so callers see that the shutdown command failed; update the logic in reboot that invokes Command::new("shutdown").args(["/r","/t","0"]).status() to check ExitStatus::success() and propagate a descriptive error when it is false.
538-546:⚠️ Potential issue | 🟠 MajorFail fast when
run_as_usercannot be honored.Returning
Ok(uid)after only logging a warning makesrun_as_userlook successful even though the frame will still launch as the current Windows account.Suggested fix
fn create_user_if_unexisting(&self, username: &str, uid: u32, _gid: u32) -> Result<u32> { let current_user = std::env::var("USERNAME").unwrap_or_default(); - if !current_user.is_empty() && current_user != username { + if !current_user.is_empty() && !current_user.eq_ignore_ascii_case(username) { warn!( - "User creation is not supported on Windows. Requested user '{}' will be ignored.", - username + "run_as_user is not supported on Windows: requested '{}', current '{}'", + username, + current_user ); + return Err(miette!("run_as_user is not supported on Windows")); } Ok(uid) }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@rust/crates/rqd/src/system/windows.rs` around lines 538 - 546, The function create_user_if_unexisting currently logs a warning when the requested username differs from the current Windows account but then returns Ok(uid), which falsely signals success; change the control flow so that when current_user is non-empty and != username the function returns an Err with a clear, descriptive error (e.g., "cannot honor run_as_user on Windows: requested user 'X' differs from current user 'Y'") instead of Ok(uid). Use the same error construction/helper pattern used elsewhere in this module (the crate's Result/Error type) so callers of create_user_if_unexisting and the run_as_user flow observe failure rather than a silent warning.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@rust/crates/rqd/src/system/windows.rs`:
- Around line 145-158: The current logic assumes logical_cpus % physical_cores
== 0 and computes hyperthreading_multiplier via integer division, which drops
extra logical CPUs on non-uniform SMT machines; update the calculation and/or
build_processor_structure to ensure every logical CPU is represented: compute
hyperthreading_multiplier as (logical_cpus + physical_cores - 1) /
physical_cores (ceiling) or change build_processor_structure(logical_cpus,
physical_cores, hyperthreading_multiplier) to iterate over 0..logical_cpus and
assign each logical CPU to a core (cores_by_phys_id / threads_by_core_unique_id)
so thread_id_lookup_table includes any remainder logical CPUs, and ensure
CoreStateManager sees the full logical_cpus capacity rather than truncating to
physical_cores * floor multiplier.
- Around line 101-137: The init function should normalize the configured
temp_path for Windows so read_temp_storage()/collect_stats() can match mount
points when MachineConfig::default leaves temp_path as "/tmp"; update
windows::init to detect Windows (identified_os or cfg!(windows)) and compute a
normalized_temp_path (convert "/tmp" to a Windows temp like from
std::env::temp_dir() or replace forward slashes/ensure a drive letter) then
store that normalized value into the runtime config used by read_temp_storage()
(e.g., set it on the cloned config passed into Self or add a
normalized_temp_path field on the returned struct); adjust references in
read_temp_storage() to use the normalized value so collect_stats() succeeds on
default Windows setups.
- Around line 348-366: collect_lineage currently always seeds root_pid causing
bogus single-item lineages for inactive roots; before pushing root_pid into the
initial stack, verify the root is active (e.g., check
self.session_processes.contains_key(&root_pid) or call the existing proc-active
check used elsewhere) and return an empty Vec if the root is not present/active;
apply the same root-presence check in get_proc_lineage and in
kill_session_internal so they treat inactive roots as None/failure instead of
fabricating a lineage—update the functions collect_lineage, get_proc_lineage,
and kill_session_internal to perform this pre-check and only build the
stack/lineage when the root is confirmed active.
---
Duplicate comments:
In `@rust/crates/rqd/src/system/windows.rs`:
- Around line 634-640: The reboot function currently treats any successful spawn
of Command::new("shutdown") as success; instead call .status(), inspect the
returned ExitStatus with ExitStatus::success(), and only return Ok(()) when
success is true; if not, convert the non-success status into an Err (using
into_diagnostic() / wrap_err with the status or code included) so callers see
that the shutdown command failed; update the logic in reboot that invokes
Command::new("shutdown").args(["/r","/t","0"]).status() to check
ExitStatus::success() and propagate a descriptive error when it is false.
- Around line 538-546: The function create_user_if_unexisting currently logs a
warning when the requested username differs from the current Windows account but
then returns Ok(uid), which falsely signals success; change the control flow so
that when current_user is non-empty and != username the function returns an Err
with a clear, descriptive error (e.g., "cannot honor run_as_user on Windows:
requested user 'X' differs from current user 'Y'") instead of Ok(uid). Use the
same error construction/helper pattern used elsewhere in this module (the
crate's Result/Error type) so callers of create_user_if_unexisting and the
run_as_user flow observe failure rather than a silent warning.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: defaults
Review profile: CHILL
Plan: Pro
Run ID: e2205572-0b6f-4fbb-8584-d7d8b216351d
📒 Files selected for processing (1)
rust/crates/rqd/src/system/windows.rs
| pub fn init(config: &MachineConfig, cpu_initial_info: ProcessorInfoData) -> Result<Self> { | ||
| let identified_os = config | ||
| .override_real_values | ||
| .clone() | ||
| .and_then(|c| c.os) | ||
| .unwrap_or_else(|| "windows".to_string()); | ||
|
|
||
| // Initialize sysinfo collector | ||
| let sysinfo = sysinfo::System::new_with_specifics( | ||
| RefreshKind::nothing().with_memory(MemoryRefreshKind::everything()), | ||
| ); | ||
| let total_memory = sysinfo.total_memory(); | ||
| let total_swap = sysinfo.total_swap(); | ||
|
|
||
| Ok(Self { | ||
| config: config.clone(), | ||
| static_info: MachineStaticInfo { | ||
| hostname: Self::get_hostname(config.use_ip_as_hostname)?, | ||
| total_memory, | ||
| total_swap, | ||
| num_sockets: cpu_initial_info.num_sockets, | ||
| cores_per_socket: cpu_initial_info.cores_per_socket, | ||
| hyperthreading_multiplier: cpu_initial_info.hyperthreading_multiplier, | ||
| boot_time_secs: sysinfo::System::boot_time(), | ||
| tags: Self::setup_tags(config), | ||
| }, | ||
| hardware_state: HardwareState::Up, | ||
| attributes: HashMap::from([ | ||
| ("SP_OS".to_string(), identified_os), | ||
| ( | ||
| "hyperthreadingMultiplier".to_string(), | ||
| cpu_initial_info.hyperthreading_multiplier.to_string(), | ||
| ), | ||
| ]), | ||
| sysinfo_system: Mutex::new(sysinfo::System::new()), | ||
| session_processes: DashMap::new(), | ||
| }) |
There was a problem hiding this comment.
Normalize temp_path for Windows during init.
read_temp_storage() uses self.config.temp_path verbatim, but MachineConfig::default still sets that field to /tmp in rust/crates/rqd/src/config/mod.rs. On a default Windows setup this never matches any mount point, so collect_stats() fails until the path is overridden manually.
Suggested fix
pub fn init(config: &MachineConfig, cpu_initial_info: ProcessorInfoData) -> Result<Self> {
+ let mut config = config.clone();
+ if config.temp_path.is_empty() || config.temp_path == "/tmp" {
+ config.temp_path = std::env::temp_dir().display().to_string();
+ }
+ let tags = Self::setup_tags(&config);
+
let identified_os = config
.override_real_values
.clone()
.and_then(|c| c.os)
.unwrap_or_else(|| "windows".to_string());
@@
- tags: Self::setup_tags(config),
+ tags,Also applies to: 286-303
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@rust/crates/rqd/src/system/windows.rs` around lines 101 - 137, The init
function should normalize the configured temp_path for Windows so
read_temp_storage()/collect_stats() can match mount points when
MachineConfig::default leaves temp_path as "/tmp"; update windows::init to
detect Windows (identified_os or cfg!(windows)) and compute a
normalized_temp_path (convert "/tmp" to a Windows temp like from
std::env::temp_dir() or replace forward slashes/ensure a drive letter) then
store that normalized value into the runtime config used by read_temp_storage()
(e.g., set it on the cloned config passed into Self or add a
normalized_temp_path field on the returned struct); adjust references in
read_temp_storage() to use the normalized value so collect_stats() succeeds on
default Windows setups.
| let logical_cpus = sysinfo.cpus().len().max(1) as u32; | ||
| let physical_cores = sysinfo | ||
| .physical_core_count() | ||
| .unwrap_or(logical_cpus as usize) | ||
| .max(1) as u32; | ||
|
|
||
| let hyperthreading_multiplier = (logical_cpus / physical_cores).max(1); | ||
|
|
||
| let (threads_by_core_unique_id, cores_by_phys_id, thread_id_lookup_table) = | ||
| Self::build_processor_structure( | ||
| logical_cpus, | ||
| physical_cores, | ||
| hyperthreading_multiplier, | ||
| ); |
There was a problem hiding this comment.
Do not drop logical CPUs on non-uniform SMT hosts.
This assumes logical_cpus % physical_cores == 0. On a 20-logical / 14-physical machine the floored multiplier becomes 1, so build_processor_structure() only emits thread IDs 0..13; the remaining logical CPUs never enter thread_id_lookup_table, and CoreStateManager sees less capacity than the host actually has.
Minimum fallback until real Win32 topology is wired in
fn build_processor_structure(
logical_cpus: u32,
physical_cores: u32,
hyperthreading_multiplier: u32,
) -> (
@@
+ let remainder = logical_cpus % physical_cores;
+ let mut next_thread_id = 0;
let mut core_ids = Vec::with_capacity(physical_cores as usize);
for core_id in 0..physical_cores {
core_ids.push(core_id);
- let start_thread = core_id * hyperthreading_multiplier;
let mut threads = Vec::new();
- for offset in 0..hyperthreading_multiplier {
- let thread_id = start_thread + offset;
+ let threads_for_core = hyperthreading_multiplier + u32::from(core_id < remainder);
+ for _ in 0..threads_for_core {
+ let thread_id = next_thread_id;
+ next_thread_id += 1;
if thread_id < logical_cpus {
threads.push(thread_id);
thread_id_lookup_table.insert(thread_id, (0, core_id));
}
}Also applies to: 182-216
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@rust/crates/rqd/src/system/windows.rs` around lines 145 - 158, The current
logic assumes logical_cpus % physical_cores == 0 and computes
hyperthreading_multiplier via integer division, which drops extra logical CPUs
on non-uniform SMT machines; update the calculation and/or
build_processor_structure to ensure every logical CPU is represented: compute
hyperthreading_multiplier as (logical_cpus + physical_cores - 1) /
physical_cores (ceiling) or change build_processor_structure(logical_cpus,
physical_cores, hyperthreading_multiplier) to iterate over 0..logical_cpus and
assign each logical CPU to a core (cores_by_phys_id / threads_by_core_unique_id)
so thread_id_lookup_table includes any remainder logical CPUs, and ensure
CoreStateManager sees the full logical_cpus capacity rather than truncating to
physical_cores * floor multiplier.
| fn collect_lineage(&self, root_pid: u32) -> Vec<u32> { | ||
| let mut stack = vec![root_pid]; | ||
| let mut visited = std::collections::HashSet::new(); | ||
| let mut lineage = Vec::new(); | ||
|
|
||
| while let Some(pid) = stack.pop() { | ||
| if !visited.insert(pid) { | ||
| continue; | ||
| } | ||
| lineage.push(pid); | ||
| if let Some(children) = self.session_processes.get(&pid) { | ||
| for child in children.iter().rev() { | ||
| stack.push(*child); | ||
| } | ||
| } | ||
| } | ||
|
|
||
| lineage | ||
| } |
There was a problem hiding this comment.
Do not fabricate lineage for inactive roots.
collect_lineage() always seeds root_pid, so lineage.is_empty() can never trigger. That lets kill_session_internal() report success for an already-dead session, and get_proc_lineage() return Some([pid]) even though SystemManager::get_proc_lineage is supposed to return None when the root PID is inactive. Please refresh and verify the root process before building the lineage in both places.
Also applies to: 460-493, 625-631
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@rust/crates/rqd/src/system/windows.rs` around lines 348 - 366,
collect_lineage currently always seeds root_pid causing bogus single-item
lineages for inactive roots; before pushing root_pid into the initial stack,
verify the root is active (e.g., check
self.session_processes.contains_key(&root_pid) or call the existing proc-active
check used elsewhere) and return an empty Vec if the root is not present/active;
apply the same root-presence check in get_proc_lineage and in
kill_session_internal so they treat inactive roots as None/failure instead of
fabricating a lineage—update the functions collect_lineage, get_proc_lineage,
and kill_session_internal to perform this pre-check and only build the
stack/lineage when the root is confirmed active.
5e5914b
into
AcademySoftwareFoundation:master
First take at implementing windows support for rqd. The goal was to have feature parity, but I
wasn't able to find a Windows tool to emulate taskset and allocate processes to threads. So this
stands as the only feature not available for windows.
Summary by CodeRabbit
New Features
CI/CD
Tests
Documentation
Refactor