Skip to content

Commit 91bdc29

Browse files
xe-nvdkIgnacio Van Droogenbroeck
andauthored
fix(security): sandbox DuckDB user queries via enable_external_access=false + allowed_directories (#442)
* fix(security): sandbox DuckDB user queries via enable_external_access=false + allowed_directories Closes a critical file-read vulnerability: any token with permissions: [] could read arbitrary local files via DuckDB's I/O function family (read_csv_auto, read_json, read_text, read_blob, glob, parquet_metadata, parquet_schema, etc.). The user-SQL denylist blocked only read_parquet/arc_partition_agg and RBAC inspected FROM/JOIN clauses, so scalar table functions in the SELECT list slipped past both layers. Combined with httpfs loading, the same primitive reached SSRF (instance-metadata IPs), auth.db exfiltration, and cross-tenant Parquet reads. Replaces the maintenance-treadmill denylist with a structural fix at the DuckDB layer: - After every INSTALL/LOAD completes, SET GLOBAL allowed_directories to Arc's storage roots (local + hot S3/Azure + cold-tier S3/Azure), plus DuckDB spill, plus a dedicated import-upload subdir, plus the compaction temp dir. - SET GLOBAL enable_external_access = false (one-way at runtime). - Verified by reading back the flag via QueryRowContext with a 5s context-bounded budget. Operator-facing changes: - storage.local_path, database.temp_directory, compaction.temp_directory, and arcx.storage_root are now resolved to absolute, forward-slash paths before being passed to DuckDB. Defaults continue to work; non-absolute operator values are normalised against the process working directory at startup. Paths containing control bytes, Unicode formatting characters, or line/paragraph separators (Cc, Cf, Zl, Zp) are rejected with a Fatal. - System roots (/, /etc, /usr, /bin, /sbin, /boot, /proc, /sys, /dev, /tmp, /var, /home, /root, /Users) are refused as storage.local_path so a typo cannot neuter the sandbox. - Multipart import uploads moved from os.TempDir() to a dedicated arc-uploads subdirectory under database.temp_directory. The directory is created 0o700, rejected if pre-staged as a symlink, and its ancestor chain is resolved with EvalSymlinks (Warn + use resolved path on mismatch, so Docker bind-mounts and /var → /private/var on macOS continue to work). - DELETE on S3-backed deployments and the profile-mode query temp file now share the same allowlisted directory; both fail-closed at request time when misconfigured. - The arcx loader is now a one-shot LOAD + SET GLOBAL during configureDatabase; the previous per-connection connInitFn relied on the incorrect premise that DuckDB extension state is per-connection. Tests added: - TestSandbox (CVE reproduction + full I/O family + SSRF + COPY TO local + COPY TO s3:// outside allowlist + EXPORT DATABASE + INSTALL after lockdown + cross-connection enforcement + range() remains callable + lockdown is one-way). - TestBuildAllowedDirectories (table-driven: hot/cold S3 full-URI dedup, same-bucket-different-prefix, leading/interior-slash collapse, parent-traversal '..' fallback, trailing-slash idempotence, empty-config behavior). - TestSandboxEmptyAllowlistLogsButDoesNotPanic. The existing arcx tests confirm arcx_version() and SET GLOBAL arcx.storage_root propagate database-wide across distinct concurrent pool connections. Reported by Alex Manson (@NeuroWinter, https://neurowinter.com/). * fix(security): use normalized dbConfig.UploadDir + ToSlash profile paths for Windows portability Addresses Gemini Code Assist review on PR #442: - cmd/arc/main.go: pass `dbConfig.UploadDir` (already ToSlash-normalized on line 397) to NewImportHandler/NewDeleteHandler instead of the local `uploadDir` variable, which on Windows would contain backslashes from filepath.Join + filepath.EvalSymlinks. The sandbox allowlist uses the forward-slash form; the handlers must match or the OS layer opens a path the sandbox didn't whitelist. - internal/database/duckdb.go + duckdb_arrow.go: wrap profilePath with filepath.ToSlash before PRAGMA profiling_output interpolation. The path from os.CreateTemp uses native separators on Windows; without ToSlash, the SQL value mismatches what's in allowed_directories. No correctness change on Linux/macOS (filepath.ToSlash is a no-op there). Closes the four Gemini-flagged Windows-portability gaps with no other behavior change. * fix(security): extend sandbox hardening to TempDirectory/Compaction + ArcxExtensionPath Addresses Gemini Code Assist 2nd review on PR #442: - TempDirectory and CompactionTempDirectory are also added to the DuckDB sandbox allowlist; the system-root deny-list (\"/\", \"/etc\", \"/tmp\", etc.) was previously only applied to LocalStorageRoot. Extended to all three so a typo like `database.temp_directory = \"/etc\"` cannot silently allowlist /etc. - EvalSymlinks Warn-and-substitute was only applied to uploadDir. The same mismatch can happen for the other three local-dir configs: filepath.Abs does not resolve symlinks, so on macOS (/var → /private/var), Docker bind-mounts, or K8s subPath the sandbox literal-string can mismatch the kernel's resolved path. Now applied uniformly to LocalStorageRoot, TempDirectory, and CompactionTempDirectory. - ArcxExtensionPath is interpolated into a `LOAD '<path>'` statement but was bypassing resolveAbsPath. Routed through it now (control-char rejection + Abs + ToSlash) for symmetry with every other operator path. Side effect: the three local directories must exist before EvalSymlinks is called, so main.go creates them with 0o700 first. MkdirAll is idempotent — configureDatabase's own MkdirAll(TempDirectory) becomes a no-op on the second call. * fix(security): ToSlash temp paths from os.MkdirTemp/CreateTemp in import + delete handlers Addresses Gemini Code Assist 3rd review on PR #442: On Windows, os.MkdirTemp and os.CreateTemp return native-separator paths (backslashes). The DuckDB sandbox allowlist uses forward slashes (main.go normalizes every entry via filepath.ToSlash). Without ToSlash on the per-request temp paths: - internal/api/import.go: read_csv/read_parquet on the uploaded file would mismatch the allowlist prefix → permission error on every import. - internal/api/delete.go: COPY ... TO '<tempPath>' interpolates the path verbatim into SQL → same mismatch → DELETE fails on S3 backends on Windows hosts. ToSlash is a no-op on Linux/macOS (filepath.Separator == '/'), so no behavior change there. * fix(security): prefix-match deniedRoots so /etc/anything is rejected, not just exact /etc Addresses Gemini Code Assist 5th review on PR #442. The previous deny-list used exact-string equality, which caught `local_path = "/etc"` but not `local_path = "/etc/arc-data"` — the latter would still neuter the sandbox by allowlisting all of `/etc/arc-data/` and (if any file landed there from another source) exfiltrating it through Arc. Replace the map with a prefix-anchored check that matches: - exact root: `/etc` - trailing-slash form: `/etc/` - any subdirectory: `/etc/...` Anchored on `root + "/"` so `/etcd-data` is NOT matched by `/etc` — only true subdirectories or the bare directory. Narrowed the deny-list to roots where ANY subdirectory is suspect (/etc, /usr, /bin, /sbin, /boot, /proc, /sys, /dev, /root). Dropped /var, /home, /tmp, /Users from deny because production deployments legitimately put Arc data under /var/lib/arc, /home/arc-user/data, or /Users/<dev>/arc-data — exact-match was already protecting against the typo (`local_path = "/var"`) and prefix-match here would block real deployments. Operator still owns those areas. * fix(security): fail-fast in main.go when sandbox allowlist would be empty Addresses Gemini Code Assist 6th review on PR #442. Gemini argued the empty-allowlist branch should Fatal because it disables all DuckDB file-based functionality. They're right that production should never reach that state — every operator path contributes to the allowlist, and main.go defaults TempDirectory to "./.tmp" so the allowlist is always non-empty. But the Fatal belongs in main.go, not in internal/database/sandbox.go. The library function is also called by: - TestSandboxEmptyAllowlistLogsButDoesNotPanic (which explicitly asserts the Warn-and-continue semantics; deleting that test to accommodate a Fatal would also delete the documentation of what happens when a library caller passes a minimal Config). - Programmatic embeddings that construct database.Config{...minimal...} directly — round-1 review #1 flagged this and the test was written for them. The production guard in main.go fires AFTER all path resolution and symlink handling completes, before database.New. main.go's "./.tmp" fallback for TempDirectory makes this branch effectively unreachable today; the guard is a safety net for a future refactor that drops the fallback. The Warn in sandbox.go remains so library callers still get a clear signal. --------- Co-authored-by: Ignacio Van Droogenbroeck <ignacio@vandroogenbroeck.net>
1 parent 82fc080 commit 91bdc29

10 files changed

Lines changed: 1230 additions & 156 deletions

File tree

RELEASE_NOTES_2026.06.1.md

Lines changed: 33 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -2,6 +2,39 @@
22

33
> **Status:** In progress. Entries are added as PRs land.
44
5+
## Security fixes
6+
7+
### DuckDB I/O sandbox closes arbitrary file-read by any authenticated caller
8+
9+
Reported by [Alex Manson](https://neurowinter.com/) ([@NeuroWinter](https://github.com/NeuroWinter)) — thank you for the detailed report and proof-of-concept.
10+
11+
An external security report against the v26.05.1 main branch found that any token holding even an empty `permissions: []` set could read arbitrary local files through DuckDB's I/O function family — `read_csv_auto`, `read_json`, `read_text`, `read_blob`, `glob`, `parquet_metadata`, `parquet_schema`, `read_xlsx`, etc. The existing user-SQL denylist blocked only `read_parquet(` and `arc_partition_agg(`, and RBAC table-level checks inspected `FROM`/`JOIN` clauses, so a scalar table function in the `SELECT` list slipped past both layers.
12+
13+
Impact on a deployment with auth enabled but RBAC not subscribed: a single-line POST to `/api/v1/query` returned the contents of `auth.db` (bcrypt hashes plus legacy SHA-256 rows), `arc.toml` (S3 secrets), TLS private keys, `/proc/self/environ`, cross-tenant Parquet files, and — when `httpfs` was loaded — instance-metadata IPs over SSRF.
14+
15+
26.06.1 replaces the denylist with a structural fix at the DuckDB layer: after every `INSTALL`/`LOAD` and every `SET GLOBAL` Arc itself needs has run, the startup path sets
16+
17+
```sql
18+
SET GLOBAL allowed_directories = ['<local_storage_root>/', '<temp_directory>/', '<upload_dir>/', '<compaction_temp>/', 's3://<hot_bucket>/<prefix>/', 's3://<cold_bucket>/<prefix>/', 'azure://<container>/', 'azure://<cold_container>/']
19+
SET GLOBAL enable_external_access = false
20+
```
21+
22+
After this flip DuckDB refuses to open any file outside the allowlist, refuses any further `INSTALL`/`LOAD`, and rejects attempts to re-enable external access at runtime. Already-loaded extensions (httpfs, azure, cache_httpfs, arcx) remain fully callable — `enable_external_access` is checked at extension-load time, file access at file-open time. The allowlist is enforced uniformly across every pool connection (DuckDB's `ExtensionManager` and `SET GLOBAL` propagate database-wide), so a per-connection regression is not possible. The flip is read back and verified at startup so a future DuckDB release silently rejecting it would surface as a startup error rather than a silently-broken sandbox.
23+
24+
A startup-time guard rejects operator-supplied paths that contain any Unicode control character (Cc — `\0`, `\n`, `\r`, `\t`, vertical tab, form feed, 0x7F), formatting character (Cf — LRM/RLM/LRE/RLE/PDF/LRO/RLO/LRI/RLI/FSI/PDI, ZWSP/ZWNJ/ZWJ, BOM, soft hyphen), or line/paragraph separator (Zl/Zp — U+2028/U+2029) before they reach the SQL literal — closing both newline injection and bidi-override log-spoofing surfaces. The upload directory used for multipart imports is created with mode `0700`, chmod'd back to `0700` if it already existed with looser permissions, and rejected at startup if a symlink has been pre-staged in its place.
25+
26+
**Operator-facing change — paths must resolve to absolute**: `storage.local_path`, `database.temp_directory`, `compaction.temp_directory`, and (when arcx is enabled) the arcx storage root are now all resolved to absolute, forward-slash paths before being passed to DuckDB's allowlist. The defaults (`./data/arc`, `./.tmp`, `./data/compaction`) continue to work — they're resolved against the process working directory at startup. Operators who set these to a non-absolute path in `arc.toml` will see the resolved absolute path in the startup log line `DuckDB external access locked down (sandbox active)`.
27+
28+
**Import upload directory moved**: multipart uploads landed in `os.TempDir()` (typically `/tmp` or `/var/folders/.../T/`) before 26.06.1. They now land in a dedicated `arc-uploads` subdirectory under the operator-configured temp directory — `${database.temp_directory}/arc-uploads`. Existing imports continue to work; deployments that scrape `/tmp` for monitoring no longer see Arc upload files.
29+
30+
**Compaction and DELETE on S3 backends use the same allowlisted staging area** so their `COPY ... TO` writes succeed under the sandbox. No operator action required — the wiring is internal.
31+
32+
**Profile-mode queries** write the JSON profiling output under `${database.temp_directory}` instead of `os.TempDir()` so `PRAGMA profiling_output` writes through an allowlisted path. Profile data was already silently empty in some DuckDB releases that route the profile writer through `OpenerFileSystem`; this fix is preemptive.
33+
34+
The arcx loader was simplified: the previous per-connection `connInitFn` (which executed `LOAD '<arcx-path>'` on every new pooled connection) has been replaced with a one-shot LOAD during `configureDatabase`, plus `SET GLOBAL "arcx.storage_root" = '<path>'`. Both propagate database-wide. The per-connection LOAD was a leftover from before we'd verified DuckDB's extension model; functionally identical, simpler to reason about.
35+
36+
Tests added: `TestSandbox` (CVE reproduction + full I/O family + SSRF + `COPY TO` local + `COPY TO 's3://...'` + `EXPORT DATABASE` outside allowlist + `INSTALL` after lockdown + cross-connection enforcement + `range()` remains callable + lockdown is one-way), `TestBuildAllowedDirectories` (12 table cases covering hot/cold S3 dedup, leading/interior-slash collapse, trailing-slash idempotence, empty-config behavior), `TestSandboxEmptyAllowlistLogsButDoesNotPanic`. The existing arcx tests confirm `arcx_version()` and `SET GLOBAL arcx.storage_root` propagate across 3–4 concurrent pool connections.
37+
538
## Bug fixes
639

740
### Parser no longer mis-resolves bare `time` column inside `EXTRACT(YEAR FROM time)`

0 commit comments

Comments
 (0)