Summary
While investigating multi-tenant security for duckdb-server-go (following up on #817), I found two bugs and an additional security consideration:
--function-blocklist flag is not wired up - The flag is parsed and the validator exists and is tested, but it's never actually applied to queries
exec command bypasses all validation - Schema validation via --schema-match-headers only applies to arrow and json commands, not exec
- Shared preagg schema exposes cross-tenant data - If multiple tenants share the default
mosaic preagg schema, aggregated data leaks between tenants
Issue 1: Function blocklist is dead code
The --function-blocklist flag is parsed in main.go:40-42 and passed to options via query.WithFunctionBlocklist() at line 89:
// main.go:40-42
var functionBlocklist []string
if *functionBlocklistStr != "" {
functionBlocklist = strings.Split(*functionBlocklistStr, ",")
}
// main.go:89
query.WithFunctionBlocklist(functionBlocklist),
The option is stored in Options.FunctionBlocklist (options.go:26), and a working validator exists (validator.go:208-259).
However, query.New() never reads FunctionBlocklist from the options, and the validator is never instantiated in production code.
The tests pass because they create the validator directly:
// validator_test.go:213-214
if len(tt.functionBlocklist) > 0 {
validators = append(validators, newFunctionBlocklistValidator(tt.functionBlocklist))
}
Impact: Users who set --function-blocklist "read_parquet,iceberg_scan" expecting to block these functions are not protected.
Issue 2: exec command bypasses schema validation
In server.go:260-268, schema validation is only applied to arrow and json commands:
switch *params.Type {
case CommandExec:
return nil, false, s.db.Exec(ctx, *params.SQL) // No allowedSchemas!
case CommandArrow:
return s.db.QueryArrow(ctx, *params.SQL, allowedSchemas, useCache)
case CommandJSON:
return s.db.QueryJSON(ctx, *params.SQL, allowedSchemas, useCache)
}
And in query.go:186-193, Exec() runs queries directly without any validation:
func (db *DB) Exec(ctx context.Context, query string) error {
_, err := db.db.ExecContext(ctx, query) // No ValidateSQL call
if err != nil {
return fmt.Errorf("query: failed to execute query: %w", err)
}
return nil
}
Impact: A malicious client can bypass schema isolation. Since exec doesn't return results, the attack vectors are data exfiltration via COPY/INSERT and destructive operations:
Data exfiltration via INSERT:
{"type": "exec", "sql": "INSERT INTO my_tenant.stolen SELECT * FROM other_tenant.sensitive_data"}
Data exfiltration via COPY:
{"type": "exec", "sql": "COPY (SELECT * FROM other_tenant.sensitive_data) TO '/tmp/exfil.parquet'"}
Destructive attacks:
{"type": "exec", "sql": "DROP TABLE other_tenant.data"}
{"type": "exec", "sql": "DELETE FROM other_tenant.bookings WHERE 1=1"}
{"type": "exec", "sql": "UPDATE other_tenant.bookings SET revenue = 0"}
Issue 3: Shared preagg schema exposes cross-tenant data
Even with schema validation fixed for exec, there's an additional multi-tenant concern with Mosaic's pre-aggregation feature.
Mosaic creates materialized tables in the configured preagg.schema (default: mosaic):
CREATE TABLE IF NOT EXISTS mosaic.preagg_<hash> AS
SELECT client_key, sum(revenue), count(*)
FROM client_abc.bookings
GROUP BY ...
If multiple tenants share the same mosaic schema:
- Tenant A's preagg tables contain their aggregated data (sums, counts, averages)
- Tenant B could query or DROP Tenant A's preagg tables
- Even aggregated data may be sensitive (total revenue, booking counts, etc.)
Impact: Cross-tenant data exposure via shared preagg tables, even when source data schemas are properly isolated.
Suggested Fix
-
Wire up function blocklist: Store FunctionBlocklist in the DB struct and create the validator in WriteJSON/WriteArrow
-
Apply validation to Exec: Pass allowedSchemas to Exec() and call ValidateSQL() with the same validators
-
Per-tenant preagg schemas: For multi-tenant deployments, the Mosaic coordinator should be configured with a per-tenant preagg.schema (e.g., mosaic_client_abc instead of shared mosaic), and this schema should be included in the --schema-match-headers allowed list alongside the data schema
-
Documentation: The multi-tenant security model should be documented, making clear that:
--schema-match-headers only protects arrow/json commands currently
exec requires additional protection for full isolation
- Preagg schemas must be isolated per-tenant to prevent cross-tenant data exposure
Alternative: Full Isolation
It's worth noting that MotherDuck's approach to multi-tenancy is full isolation - separate Ducklings per tenant. This sidesteps all the complexity above but requires more infrastructure. For some use cases (like entities moving between tenants), shared infrastructure with proper validation may still be preferable.
Happy to submit a PR if this approach is acceptable.
Environment
Summary
While investigating multi-tenant security for duckdb-server-go (following up on #817), I found two bugs and an additional security consideration:
--function-blocklistflag is not wired up - The flag is parsed and the validator exists and is tested, but it's never actually applied to queriesexeccommand bypasses all validation - Schema validation via--schema-match-headersonly applies toarrowandjsoncommands, notexecmosaicpreagg schema, aggregated data leaks between tenantsIssue 1: Function blocklist is dead code
The
--function-blocklistflag is parsed inmain.go:40-42and passed to options viaquery.WithFunctionBlocklist()at line 89:The option is stored in
Options.FunctionBlocklist(options.go:26), and a working validator exists (validator.go:208-259).However,
query.New()never readsFunctionBlocklistfrom the options, and the validator is never instantiated in production code.The tests pass because they create the validator directly:
Impact: Users who set
--function-blocklist "read_parquet,iceberg_scan"expecting to block these functions are not protected.Issue 2:
execcommand bypasses schema validationIn
server.go:260-268, schema validation is only applied toarrowandjsoncommands:And in
query.go:186-193,Exec()runs queries directly without any validation:Impact: A malicious client can bypass schema isolation. Since
execdoesn't return results, the attack vectors are data exfiltration via COPY/INSERT and destructive operations:Data exfiltration via INSERT:
{"type": "exec", "sql": "INSERT INTO my_tenant.stolen SELECT * FROM other_tenant.sensitive_data"}Data exfiltration via COPY:
{"type": "exec", "sql": "COPY (SELECT * FROM other_tenant.sensitive_data) TO '/tmp/exfil.parquet'"}Destructive attacks:
{"type": "exec", "sql": "DROP TABLE other_tenant.data"} {"type": "exec", "sql": "DELETE FROM other_tenant.bookings WHERE 1=1"} {"type": "exec", "sql": "UPDATE other_tenant.bookings SET revenue = 0"}Issue 3: Shared preagg schema exposes cross-tenant data
Even with schema validation fixed for
exec, there's an additional multi-tenant concern with Mosaic's pre-aggregation feature.Mosaic creates materialized tables in the configured
preagg.schema(default:mosaic):If multiple tenants share the same
mosaicschema:Impact: Cross-tenant data exposure via shared preagg tables, even when source data schemas are properly isolated.
Suggested Fix
Wire up function blocklist: Store
FunctionBlocklistin theDBstruct and create the validator inWriteJSON/WriteArrowApply validation to
Exec: PassallowedSchemastoExec()and callValidateSQL()with the same validatorsPer-tenant preagg schemas: For multi-tenant deployments, the Mosaic coordinator should be configured with a per-tenant
preagg.schema(e.g.,mosaic_client_abcinstead of sharedmosaic), and this schema should be included in the--schema-match-headersallowed list alongside the data schemaDocumentation: The multi-tenant security model should be documented, making clear that:
--schema-match-headersonly protectsarrow/jsoncommands currentlyexecrequires additional protection for full isolationAlternative: Full Isolation
It's worth noting that MotherDuck's approach to multi-tenancy is full isolation - separate Ducklings per tenant. This sidesteps all the complexity above but requires more infrastructure. For some use cases (like entities moving between tenants), shared infrastructure with proper validation may still be preferable.
Happy to submit a PR if this approach is acceptable.
Environment