Commit 91bdc29
fix(security): sandbox DuckDB user queries via enable_external_access=false + allowed_directories (#442)
* fix(security): sandbox DuckDB user queries via enable_external_access=false + allowed_directories
Closes a critical file-read vulnerability: any token with permissions: [] could
read arbitrary local files via DuckDB's I/O function family (read_csv_auto,
read_json, read_text, read_blob, glob, parquet_metadata, parquet_schema, etc.).
The user-SQL denylist blocked only read_parquet/arc_partition_agg and RBAC
inspected FROM/JOIN clauses, so scalar table functions in the SELECT list
slipped past both layers. Combined with httpfs loading, the same primitive
reached SSRF (instance-metadata IPs), auth.db exfiltration, and cross-tenant
Parquet reads.
Replaces the maintenance-treadmill denylist with a structural fix at the
DuckDB layer:
- After every INSTALL/LOAD completes, SET GLOBAL allowed_directories to
Arc's storage roots (local + hot S3/Azure + cold-tier S3/Azure),
plus DuckDB spill, plus a dedicated import-upload subdir, plus the
compaction temp dir.
- SET GLOBAL enable_external_access = false (one-way at runtime).
- Verified by reading back the flag via QueryRowContext with a 5s
context-bounded budget.
Operator-facing changes:
- storage.local_path, database.temp_directory, compaction.temp_directory,
and arcx.storage_root are now resolved to absolute, forward-slash paths
before being passed to DuckDB. Defaults continue to work; non-absolute
operator values are normalised against the process working directory at
startup. Paths containing control bytes, Unicode formatting characters,
or line/paragraph separators (Cc, Cf, Zl, Zp) are rejected with a Fatal.
- System roots (/, /etc, /usr, /bin, /sbin, /boot, /proc, /sys, /dev,
/tmp, /var, /home, /root, /Users) are refused as storage.local_path
so a typo cannot neuter the sandbox.
- Multipart import uploads moved from os.TempDir() to a dedicated
arc-uploads subdirectory under database.temp_directory. The directory
is created 0o700, rejected if pre-staged as a symlink, and its
ancestor chain is resolved with EvalSymlinks (Warn + use resolved
path on mismatch, so Docker bind-mounts and /var → /private/var on
macOS continue to work).
- DELETE on S3-backed deployments and the profile-mode query temp file
now share the same allowlisted directory; both fail-closed at request
time when misconfigured.
- The arcx loader is now a one-shot LOAD + SET GLOBAL during
configureDatabase; the previous per-connection connInitFn relied on
the incorrect premise that DuckDB extension state is per-connection.
Tests added:
- TestSandbox (CVE reproduction + full I/O family + SSRF + COPY TO local
+ COPY TO s3:// outside allowlist + EXPORT DATABASE + INSTALL after
lockdown + cross-connection enforcement + range() remains callable +
lockdown is one-way).
- TestBuildAllowedDirectories (table-driven: hot/cold S3 full-URI dedup,
same-bucket-different-prefix, leading/interior-slash collapse,
parent-traversal '..' fallback, trailing-slash idempotence,
empty-config behavior).
- TestSandboxEmptyAllowlistLogsButDoesNotPanic.
The existing arcx tests confirm arcx_version() and SET GLOBAL
arcx.storage_root propagate database-wide across distinct concurrent
pool connections.
Reported by Alex Manson (@NeuroWinter, https://neurowinter.com/).
* fix(security): use normalized dbConfig.UploadDir + ToSlash profile paths for Windows portability
Addresses Gemini Code Assist review on PR #442:
- cmd/arc/main.go: pass `dbConfig.UploadDir` (already ToSlash-normalized
on line 397) to NewImportHandler/NewDeleteHandler instead of the local
`uploadDir` variable, which on Windows would contain backslashes from
filepath.Join + filepath.EvalSymlinks. The sandbox allowlist uses the
forward-slash form; the handlers must match or the OS layer opens a
path the sandbox didn't whitelist.
- internal/database/duckdb.go + duckdb_arrow.go: wrap profilePath with
filepath.ToSlash before PRAGMA profiling_output interpolation. The
path from os.CreateTemp uses native separators on Windows; without
ToSlash, the SQL value mismatches what's in allowed_directories.
No correctness change on Linux/macOS (filepath.ToSlash is a no-op there).
Closes the four Gemini-flagged Windows-portability gaps with no other
behavior change.
* fix(security): extend sandbox hardening to TempDirectory/Compaction + ArcxExtensionPath
Addresses Gemini Code Assist 2nd review on PR #442:
- TempDirectory and CompactionTempDirectory are also added to the DuckDB
sandbox allowlist; the system-root deny-list (\"/\", \"/etc\", \"/tmp\",
etc.) was previously only applied to LocalStorageRoot. Extended to all
three so a typo like `database.temp_directory = \"/etc\"` cannot
silently allowlist /etc.
- EvalSymlinks Warn-and-substitute was only applied to uploadDir. The
same mismatch can happen for the other three local-dir configs:
filepath.Abs does not resolve symlinks, so on macOS (/var → /private/var),
Docker bind-mounts, or K8s subPath the sandbox literal-string can
mismatch the kernel's resolved path. Now applied uniformly to
LocalStorageRoot, TempDirectory, and CompactionTempDirectory.
- ArcxExtensionPath is interpolated into a `LOAD '<path>'` statement
but was bypassing resolveAbsPath. Routed through it now (control-char
rejection + Abs + ToSlash) for symmetry with every other operator path.
Side effect: the three local directories must exist before EvalSymlinks
is called, so main.go creates them with 0o700 first. MkdirAll is
idempotent — configureDatabase's own MkdirAll(TempDirectory) becomes a
no-op on the second call.
* fix(security): ToSlash temp paths from os.MkdirTemp/CreateTemp in import + delete handlers
Addresses Gemini Code Assist 3rd review on PR #442:
On Windows, os.MkdirTemp and os.CreateTemp return native-separator
paths (backslashes). The DuckDB sandbox allowlist uses forward slashes
(main.go normalizes every entry via filepath.ToSlash). Without
ToSlash on the per-request temp paths:
- internal/api/import.go: read_csv/read_parquet on the uploaded file
would mismatch the allowlist prefix → permission error on every
import.
- internal/api/delete.go: COPY ... TO '<tempPath>' interpolates the
path verbatim into SQL → same mismatch → DELETE fails on S3
backends on Windows hosts.
ToSlash is a no-op on Linux/macOS (filepath.Separator == '/'), so no
behavior change there.
* fix(security): prefix-match deniedRoots so /etc/anything is rejected, not just exact /etc
Addresses Gemini Code Assist 5th review on PR #442.
The previous deny-list used exact-string equality, which caught
`local_path = "/etc"` but not `local_path = "/etc/arc-data"` — the
latter would still neuter the sandbox by allowlisting all of
`/etc/arc-data/` and (if any file landed there from another source)
exfiltrating it through Arc.
Replace the map with a prefix-anchored check that matches:
- exact root: `/etc`
- trailing-slash form: `/etc/`
- any subdirectory: `/etc/...`
Anchored on `root + "/"` so `/etcd-data` is NOT matched by `/etc` —
only true subdirectories or the bare directory.
Narrowed the deny-list to roots where ANY subdirectory is suspect
(/etc, /usr, /bin, /sbin, /boot, /proc, /sys, /dev, /root). Dropped
/var, /home, /tmp, /Users from deny because production deployments
legitimately put Arc data under /var/lib/arc, /home/arc-user/data,
or /Users/<dev>/arc-data — exact-match was already protecting
against the typo (`local_path = "/var"`) and prefix-match here
would block real deployments. Operator still owns those areas.
* fix(security): fail-fast in main.go when sandbox allowlist would be empty
Addresses Gemini Code Assist 6th review on PR #442.
Gemini argued the empty-allowlist branch should Fatal because it
disables all DuckDB file-based functionality. They're right that
production should never reach that state — every operator path
contributes to the allowlist, and main.go defaults TempDirectory to
"./.tmp" so the allowlist is always non-empty.
But the Fatal belongs in main.go, not in internal/database/sandbox.go.
The library function is also called by:
- TestSandboxEmptyAllowlistLogsButDoesNotPanic (which explicitly
asserts the Warn-and-continue semantics; deleting that test to
accommodate a Fatal would also delete the documentation of what
happens when a library caller passes a minimal Config).
- Programmatic embeddings that construct database.Config{...minimal...}
directly — round-1 review #1 flagged this and the test was written
for them.
The production guard in main.go fires AFTER all path resolution and
symlink handling completes, before database.New. main.go's "./.tmp"
fallback for TempDirectory makes this branch effectively unreachable
today; the guard is a safety net for a future refactor that drops the
fallback.
The Warn in sandbox.go remains so library callers still get a clear
signal.
---------
Co-authored-by: Ignacio Van Droogenbroeck <ignacio@vandroogenbroeck.net>1 parent 82fc080 commit 91bdc29
10 files changed
Lines changed: 1230 additions & 156 deletions
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
2 | 2 | | |
3 | 3 | | |
4 | 4 | | |
| 5 | + | |
| 6 | + | |
| 7 | + | |
| 8 | + | |
| 9 | + | |
| 10 | + | |
| 11 | + | |
| 12 | + | |
| 13 | + | |
| 14 | + | |
| 15 | + | |
| 16 | + | |
| 17 | + | |
| 18 | + | |
| 19 | + | |
| 20 | + | |
| 21 | + | |
| 22 | + | |
| 23 | + | |
| 24 | + | |
| 25 | + | |
| 26 | + | |
| 27 | + | |
| 28 | + | |
| 29 | + | |
| 30 | + | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
5 | 38 | | |
6 | 39 | | |
7 | 40 | | |
| |||
0 commit comments