Skip to content

fix(build): reject system LLVM outside compiler sysroot#426

Open
Perdixky wants to merge 1 commit intoclice-io:mainfrom
Perdixky:main
Open

fix(build): reject system LLVM outside compiler sysroot#426
Perdixky wants to merge 1 commit intoclice-io:mainfrom
Perdixky:main

Conversation

@Perdixky
Copy link
Copy Markdown
Contributor

@Perdixky Perdixky commented Apr 17, 2026

When building with pixi or NixOS, clang++ has a dedicated sysroot. System LLVM at /usr is outside it, causing ABI mismatches with the sandboxed libc/libstdc++. Detect this and fall back to bundled LLVM.

Summary by CodeRabbit

Release Notes

  • Bug Fixes

    • Improved LLVM installation validation to ensure compatibility with the active compiler's sysroot configuration, preventing potential mismatches during the build setup process.
  • Chores

    • Enhanced build system include directory handling for LLVM dependencies.

Copilot AI review requested due to automatic review settings April 17, 2026 05:45
@chatgpt-codex-connector
Copy link
Copy Markdown

You have reached your Codex usage limits for code reviews. You can see your limits in the Codex usage dashboard.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 17, 2026

📝 Walkthrough

Walkthrough

Updated LLVM integration to validate compiler sysroot compatibility during setup. Modified CMake configuration to expose build-tree include directories for the LLVM interface library. Added sysroot detection and validation logic to ensure LLVM installation paths align with the active compiler's sandbox environment.

Changes

Cohort / File(s) Summary
CMake Include Paths
cmake/llvm.cmake
Extended llvm-libs interface target to include the build-tree include directory (${CMAKE_CURRENT_BINARY_DIR}/include) alongside the LLVM installation directory.
LLVM Setup Sysroot Validation
scripts/setup-llvm.py
Added sysroot detection from clang++ configuration and validation logic. Introduced compiler_sysroot() to extract sysroot from target config files, llvm_install_matches_compiler() to verify installation paths against sysroot bounds, and updated system_llvm_ok() to reject system LLVM installations outside the sysroot. Modified main() to enforce sysroot matching for provided install paths.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Poem

🐰 Hop hop, the paths align with care,
Sysroots checked with compiler's flair,
Build trees grow in proper place,
LLVM finds its rightful space,
No more sandbox escapes today! 🏗️✨

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 33.33% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately and specifically describes the main change: rejecting system LLVM when it's outside the compiler sysroot, which is the core objective of this PR.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

When building with pixi or NixOS, clang++ has a dedicated sysroot.
System LLVM at /usr is outside it, causing ABI mismatches with the
sandboxed libc/libstdc++. Detect this and fall back to bundled LLVM.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-Authored-By: perdixky <perdixky@users.noreply.github.com>
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR aims to prevent ABI mismatches when building in sandboxed toolchains (e.g., pixi/NixOS) by refusing to use an LLVM install that is incompatible with the active compiler’s sysroot, falling back to the bundled LLVM instead.

Changes:

  • Add clang++ sysroot detection and reject system/provided LLVM installs that don’t match.
  • Update system LLVM detection to filter out installs outside the compiler sysroot.
  • Extend LLVM include paths in CMake to also search generated/downloaded headers in the build directory.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
scripts/setup-llvm.py Detect compiler sysroot and gate system/provided LLVM usage based on sysroot compatibility.
cmake/llvm.cmake Add ${CMAKE_CURRENT_BINARY_DIR}/include to LLVM include directories to pick up generated/private headers.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread scripts/setup-llvm.py
Comment on lines +224 to +228
try:
prefix.resolve().relative_to(sysroot)
return True
except ValueError:
return False
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llvm_install_matches_compiler() currently requires the LLVM install prefix to be inside the compiler sysroot. On toolchains where the sysroot is a nested directory (common for sandboxed toolchains), the LLVM prefix may legitimately live alongside the sysroot rather than under it, and this check would incorrectly reject a compatible toolchain LLVM and force a bundled download. Consider loosening the match criteria (e.g., treat sysroot as an indicator that host paths like /usr should be rejected, or compare against the toolchain prefix that contains the sysroot rather than the sysroot directory itself).

Copilot uses AI. Check for mistakes.
Comment thread scripts/setup-llvm.py
Comment on lines +206 to +209
for line in cfg.read_text(encoding="utf-8").splitlines():
if not line.startswith("--sysroot="):
continue
sysroot = Path(line.split("=", 1)[1]).resolve()
Copy link

Copilot AI Apr 17, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

compiler_sysroot() only recognizes lines that start exactly with --sysroot=. Clang config files commonly contain blank lines/comments and may include leading whitespace or use the --sysroot <path> form; in those cases this would fail to detect the active sysroot and could reintroduce the ABI-mismatch scenario this PR is trying to prevent. It would be more robust to strip whitespace, ignore comments, and accept both --sysroot= and --sysroot forms when parsing the cfg.

Suggested change
for line in cfg.read_text(encoding="utf-8").splitlines():
if not line.startswith("--sysroot="):
continue
sysroot = Path(line.split("=", 1)[1]).resolve()
pending_sysroot = False
for line in cfg.read_text(encoding="utf-8").splitlines():
line = line.strip()
if not line or line.startswith("#"):
continue
sysroot_value = None
if pending_sysroot:
sysroot_value = line
pending_sysroot = False
elif line.startswith("--sysroot="):
sysroot_value = line.split("=", 1)[1].strip()
elif line == "--sysroot":
pending_sysroot = True
continue
if not sysroot_value:
continue
sysroot = Path(sysroot_value).resolve()

Copilot uses AI. Check for mistakes.
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 3

🧹 Nitpick comments (1)
scripts/setup-llvm.py (1)

201-213: Config-file parsing is fragile; consider a more forgiving parser.

A few corner cases not handled today:

  • startswith("--sysroot=") skips indented lines, so any --sysroot= not at column 0 is ignored. Strip the line first.
  • Comment lines (# ...) and @other.cfg includes in clang config files are silently skipped here; the latter means a sysroot set via an included fragment is invisible to this function.
  • Path(line.split("=", 1)[1]) does not strip trailing whitespace, so a config with a trailing space after the path turns into a non-existent Path("/foo ") and .exists() fails.
  • Only the per-triple <triple>.cfg is consulted. Clang also loads a generic clang++.cfg / clang.cfg from the binary dir when present; projects relying on the generic config (not uncommon) won't be detected.
♻️ Minimal hardening
-    try:
-        for line in cfg.read_text(encoding="utf-8").splitlines():
-            if not line.startswith("--sysroot="):
-                continue
-            sysroot = Path(line.split("=", 1)[1]).resolve()
-            if sysroot.exists():
-                return sysroot
-    except OSError:
-        return None
+    try:
+        for raw in cfg.read_text(encoding="utf-8").splitlines():
+            line = raw.strip()
+            if not line or line.startswith("#") or not line.startswith("--sysroot="):
+                continue
+            sysroot = Path(line.split("=", 1)[1].strip()).resolve()
+            if sysroot.exists():
+                return sysroot
+    except OSError:
+        return None
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/setup-llvm.py` around lines 201 - 213, The cfg-reading logic that
looks up sysroot from Path(compiler).with_name(f"{triple}.cfg") is too brittle:
update the parser loop (the block that reads cfg.read_text(...).splitlines()) to
strip each line before processing, skip blank lines and comment lines starting
with '#' (after strip), ignore include directives that begin with '@' by
resolving and recursively parsing those referenced files (so included fragments
can set sysroot), and split on "=" then strip the RHS before constructing Path
to avoid trailing-whitespace failures; also attempt the fallback configs
clang++.cfg and clang.cfg in the same directory if the per-triple file yields no
sysroot, and preserve the existing OSError handling around file reads.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@scripts/setup-llvm.py`:
- Around line 353-358: When handling a user-provided path that doesn't yet exist
(the else branch setting needs_install and install_path to candidate), still run
llvm_install_matches_compiler(candidate) to ensure the candidate would be inside
the compiler sysroot; if llvm_install_matches_compiler(candidate) returns False,
set install_path back to install_root (and log that the provided path was
rejected for sysroot mismatch) so we fall back to the safe install location
instead of silently installing outside the sysroot. Update the logic around
install_path/needs_install (and use
candidate/install_root/llvm_install_matches_compiler) accordingly.
- Around line 187-215: The compiler_sysroot() function currently uses
shutil.which("clang++") which can differ from the compiler CMake uses; update
the script to accept a new CLI argument (e.g. --cxx-compiler) and prefer that
path when provided, falling back to shutil.which only if the argument is
missing; update any callers (e.g. cmake/llvm.cmake) to pass
${CMAKE_CXX_COMPILER} into the script; make sure compiler_sysroot() (and the
code that invokes it) uses the provided compiler path to derive the triple and
.cfg file instead of always calling shutil.which("clang++").
- Around line 182-215: The current compiler_sysroot() only looks for a
<triple>.cfg file and misses sysroot information injected by wrappers or flags
(e.g., pixi/Conda or NixOS cc-wrapper); update detection to (1) run the compiler
with a diagnostic dry-run to capture injected flags (invoke clang++ with
["-###","-c","-x","c++","-"] or similar) and parse that output for --sysroot= or
--sysroot, (2) if the compiler path is a script/wrapper (e.g., cc-wrapper.sh),
read the wrapper text and scan for --sysroot occurrences or redirected env-vars,
and (3) check common environment variables (e.g., SYSROOT, CONDA_BUILD_SYSROOT,
CT_SYSROOT) for a sysroot override; return the first valid existing Path and
ensure llvm_install_matches_compiler() uses this new compiler_sysroot() result
to correctly reject system LLVM outside the compiler sysroot.

---

Nitpick comments:
In `@scripts/setup-llvm.py`:
- Around line 201-213: The cfg-reading logic that looks up sysroot from
Path(compiler).with_name(f"{triple}.cfg") is too brittle: update the parser loop
(the block that reads cfg.read_text(...).splitlines()) to strip each line before
processing, skip blank lines and comment lines starting with '#' (after strip),
ignore include directives that begin with '@' by resolving and recursively
parsing those referenced files (so included fragments can set sysroot), and
split on "=" then strip the RHS before constructing Path to avoid
trailing-whitespace failures; also attempt the fallback configs clang++.cfg and
clang.cfg in the same directory if the per-triple file yields no sysroot, and
preserve the existing OSError handling around file reads.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: CHILL

Plan: Pro

Run ID: 053691ca-b975-4d98-9d70-986e3a878d53

📥 Commits

Reviewing files that changed from the base of the PR and between d42d9d5 and 3b993f6.

📒 Files selected for processing (2)
  • cmake/llvm.cmake
  • scripts/setup-llvm.py

Comment thread scripts/setup-llvm.py
Comment on lines +182 to +215
# When building with pixi or NixOS, the compiler is sandboxed inside its own
# sysroot. System LLVM at /usr is outside that sysroot, so linking against it
# pulls in libraries built for the host's libc/libstdc++ which may be
# ABI-incompatible with the ones inside the sandbox. Detect this and refuse to
# use any LLVM install that falls outside the compiler's sysroot.
def compiler_sysroot() -> Path | None:
"""Return the ``--sysroot`` from clang++'s per-target config, if any."""
compiler = shutil.which("clang++")
if not compiler:
return None

try:
triple = subprocess.check_output([compiler, "-dumpmachine"], text=True).strip()
except (subprocess.CalledProcessError, OSError):
return None

if not triple:
return None

cfg = Path(compiler).with_name(f"{triple}.cfg")
if not cfg.is_file():
return None

try:
for line in cfg.read_text(encoding="utf-8").splitlines():
if not line.startswith("--sysroot="):
continue
sysroot = Path(line.split("=", 1)[1]).resolve()
if sysroot.exists():
return sysroot
except OSError:
return None

return None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🌐 Web query:

Does pixi (prefix.dev) generate a <triple>.cfg next to clang++ containing --sysroot, and how does nixpkgs stdenv configure clang's sysroot (wrapper vs config file)?

💡 Result:

No, pixi (prefix.dev) does not generate a <triple>.cfg file next to clang++. Pixi, built on the conda/rattler ecosystem, installs clang packages from conda-forge/prefix.dev channels like clang_osx-64 or clang_linux-64. These packages provide the clang++ binary directly without generating Clang driver config files (.cfg) containing --sysroot next to it. Clang config files are a native Clang/LLVM feature for default flags (e.g., x86_64-unknown-linux-gnu.cfg), loaded automatically by the driver, but no evidence shows pixi/rattler-build or conda clang packages creating them dynamically. Sysroot handling in conda/pixi clang occurs via explicit --sysroot flags (e.g., in cross-compilation via CONDA_BUILD_SYSROOT or compiler activation), environment variables like SDKROOT (macOS), or compiler activation scripts—not .cfg files. nixpkgs stdenv configures clang's sysroot primarily through a wrapper script (cc-wrapper.sh), not config files. stdenv wraps clang++ (often clang-unwrapped) in a bash wrapper that sets environment variables (NIX_CFLAGS_COMPILE, defaultTarget), adds flags like -nostdlibinc (Linux, to block /usr/include), and passes search paths (-L, -B) via nix-support/cc-{cflags,ldflags}. For cross-compilation, it adjusts paths (e.g., /${targetPlatform.config}) and sets machineFlags/defaultTarget. No .cfg files are generated; sysroot is handled via wrapper flags/env vars, preserving --sysroot user overrides.

Citations:


The detection mechanism doesn't work for either pixi or NixOS — the PR won't prevent the stated failure mode in either environment.

The implementation assumes both pixi and NixOS place a <triple>.cfg file next to clang++ with --sysroot directives. However:

  • Pixi: Does not generate .cfg files. It installs clang packages from conda-forge and handles sysroot via explicit --sysroot flags, environment variables, or compiler activation scripts.
  • NixOS: Does not use .cfg files. It wraps clang++ with a shell script (cc-wrapper.sh) that injects sysroot via compiler wrapper flags and environment variables.

In both cases, compiler_sysroot() returns None, llvm_install_matches_compiler() returns True, and system LLVM at /usr is accepted — the exact failure mode the PR claims to prevent.

🧰 Tools
🪛 Ruff (0.15.10)

[error] 194-194: subprocess call: check for execution of untrusted input

(S603)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/setup-llvm.py` around lines 182 - 215, The current compiler_sysroot()
only looks for a <triple>.cfg file and misses sysroot information injected by
wrappers or flags (e.g., pixi/Conda or NixOS cc-wrapper); update detection to
(1) run the compiler with a diagnostic dry-run to capture injected flags (invoke
clang++ with ["-###","-c","-x","c++","-"] or similar) and parse that output for
--sysroot= or --sysroot, (2) if the compiler path is a script/wrapper (e.g.,
cc-wrapper.sh), read the wrapper text and scan for --sysroot occurrences or
redirected env-vars, and (3) check common environment variables (e.g., SYSROOT,
CONDA_BUILD_SYSROOT, CT_SYSROOT) for a sysroot override; return the first valid
existing Path and ensure llvm_install_matches_compiler() uses this new
compiler_sysroot() result to correctly reject system LLVM outside the compiler
sysroot.

Comment thread scripts/setup-llvm.py
Comment on lines +187 to +215
def compiler_sysroot() -> Path | None:
"""Return the ``--sysroot`` from clang++'s per-target config, if any."""
compiler = shutil.which("clang++")
if not compiler:
return None

try:
triple = subprocess.check_output([compiler, "-dumpmachine"], text=True).strip()
except (subprocess.CalledProcessError, OSError):
return None

if not triple:
return None

cfg = Path(compiler).with_name(f"{triple}.cfg")
if not cfg.is_file():
return None

try:
for line in cfg.read_text(encoding="utf-8").splitlines():
if not line.startswith("--sysroot="):
continue
sysroot = Path(line.split("=", 1)[1]).resolve()
if sysroot.exists():
return sysroot
except OSError:
return None

return None
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Detected compiler may not be the one CMake uses.

shutil.which("clang++") discovers whatever clang++ appears first in PATH, not CMAKE_CXX_COMPILER. If the user configured CMake with a different compiler (e.g., CXX=/opt/llvm20/bin/clang++, a versioned binary like clang++-18, or gcc), the sysroot probed here will not reflect the actual build compiler. Consider passing the compiler path from cmake/llvm.cmake via a new argument (e.g. --cxx-compiler "${CMAKE_CXX_COMPILER}") and preferring that over which.

🧰 Tools
🪛 Ruff (0.15.10)

[error] 194-194: subprocess call: check for execution of untrusted input

(S603)

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/setup-llvm.py` around lines 187 - 215, The compiler_sysroot()
function currently uses shutil.which("clang++") which can differ from the
compiler CMake uses; update the script to accept a new CLI argument (e.g.
--cxx-compiler) and prefer that path when provided, falling back to shutil.which
only if the argument is missing; update any callers (e.g. cmake/llvm.cmake) to
pass ${CMAKE_CXX_COMPILER} into the script; make sure compiler_sysroot() (and
the code that invokes it) uses the provided compiler path to derive the triple
and .cfg file instead of always calling shutil.which("clang++").

Comment thread scripts/setup-llvm.py
Comment on lines 353 to +358
else:
log(
f"Provided LLVM install path does not exist; will install to {candidate}"
)
needs_install = True
install_path = candidate
install_path = candidate
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

Sysroot check is skipped when --install-path does not exist yet.

When the user passes --install-path=/foo and /foo does not yet exist, the script proceeds to extract the bundled LLVM into that path without checking whether it falls inside the compiler sysroot. If it does not, the freshly-installed LLVM still sits outside the sandbox and you've reintroduced the very ABI mismatch this PR guards against — only now it's silently written to a user-specified location.

Consider applying llvm_install_matches_compiler to the candidate path regardless of whether it exists (and fall back to install_root when it doesn't match).

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@scripts/setup-llvm.py` around lines 353 - 358, When handling a user-provided
path that doesn't yet exist (the else branch setting needs_install and
install_path to candidate), still run llvm_install_matches_compiler(candidate)
to ensure the candidate would be inside the compiler sysroot; if
llvm_install_matches_compiler(candidate) returns False, set install_path back to
install_root (and log that the provided path was rejected for sysroot mismatch)
so we fall back to the safe install location instead of silently installing
outside the sysroot. Update the logic around install_path/needs_install (and use
candidate/install_root/llvm_install_matches_compiler) accordingly.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants