Skip to content

duckdb-server-go: --function-blocklist flag has no effect and exec command bypasses schema validation #958

@danielbodart

Description

@danielbodart

Summary

While investigating multi-tenant security for duckdb-server-go (following up on #817), I found two bugs and an additional security consideration:

  1. --function-blocklist flag is not wired up - The flag is parsed and the validator exists and is tested, but it's never actually applied to queries
  2. exec command bypasses all validation - Schema validation via --schema-match-headers only applies to arrow and json commands, not exec
  3. Shared preagg schema exposes cross-tenant data - If multiple tenants share the default mosaic preagg schema, aggregated data leaks between tenants

Issue 1: Function blocklist is dead code

The --function-blocklist flag is parsed in main.go:40-42 and passed to options via query.WithFunctionBlocklist() at line 89:

// main.go:40-42
var functionBlocklist []string
if *functionBlocklistStr != "" {
    functionBlocklist = strings.Split(*functionBlocklistStr, ",")
}

// main.go:89
query.WithFunctionBlocklist(functionBlocklist),

The option is stored in Options.FunctionBlocklist (options.go:26), and a working validator exists (validator.go:208-259).

However, query.New() never reads FunctionBlocklist from the options, and the validator is never instantiated in production code.

The tests pass because they create the validator directly:

// validator_test.go:213-214
if len(tt.functionBlocklist) > 0 {
    validators = append(validators, newFunctionBlocklistValidator(tt.functionBlocklist))
}

Impact: Users who set --function-blocklist "read_parquet,iceberg_scan" expecting to block these functions are not protected.

Issue 2: exec command bypasses schema validation

In server.go:260-268, schema validation is only applied to arrow and json commands:

switch *params.Type {
case CommandExec:
    return nil, false, s.db.Exec(ctx, *params.SQL)  // No allowedSchemas!

case CommandArrow:
    return s.db.QueryArrow(ctx, *params.SQL, allowedSchemas, useCache)

case CommandJSON:
    return s.db.QueryJSON(ctx, *params.SQL, allowedSchemas, useCache)
}

And in query.go:186-193, Exec() runs queries directly without any validation:

func (db *DB) Exec(ctx context.Context, query string) error {
    _, err := db.db.ExecContext(ctx, query)  // No ValidateSQL call
    if err != nil {
        return fmt.Errorf("query: failed to execute query: %w", err)
    }
    return nil
}

Impact: A malicious client can bypass schema isolation. Since exec doesn't return results, the attack vectors are data exfiltration via COPY/INSERT and destructive operations:

Data exfiltration via INSERT:

{"type": "exec", "sql": "INSERT INTO my_tenant.stolen SELECT * FROM other_tenant.sensitive_data"}

Data exfiltration via COPY:

{"type": "exec", "sql": "COPY (SELECT * FROM other_tenant.sensitive_data) TO '/tmp/exfil.parquet'"}

Destructive attacks:

{"type": "exec", "sql": "DROP TABLE other_tenant.data"}
{"type": "exec", "sql": "DELETE FROM other_tenant.bookings WHERE 1=1"}
{"type": "exec", "sql": "UPDATE other_tenant.bookings SET revenue = 0"}

Issue 3: Shared preagg schema exposes cross-tenant data

Even with schema validation fixed for exec, there's an additional multi-tenant concern with Mosaic's pre-aggregation feature.

Mosaic creates materialized tables in the configured preagg.schema (default: mosaic):

CREATE TABLE IF NOT EXISTS mosaic.preagg_<hash> AS
  SELECT client_key, sum(revenue), count(*)
  FROM client_abc.bookings
  GROUP BY ...

If multiple tenants share the same mosaic schema:

  • Tenant A's preagg tables contain their aggregated data (sums, counts, averages)
  • Tenant B could query or DROP Tenant A's preagg tables
  • Even aggregated data may be sensitive (total revenue, booking counts, etc.)

Impact: Cross-tenant data exposure via shared preagg tables, even when source data schemas are properly isolated.

Suggested Fix

  1. Wire up function blocklist: Store FunctionBlocklist in the DB struct and create the validator in WriteJSON/WriteArrow

  2. Apply validation to Exec: Pass allowedSchemas to Exec() and call ValidateSQL() with the same validators

  3. Per-tenant preagg schemas: For multi-tenant deployments, the Mosaic coordinator should be configured with a per-tenant preagg.schema (e.g., mosaic_client_abc instead of shared mosaic), and this schema should be included in the --schema-match-headers allowed list alongside the data schema

  4. Documentation: The multi-tenant security model should be documented, making clear that:

    • --schema-match-headers only protects arrow/json commands currently
    • exec requires additional protection for full isolation
    • Preagg schemas must be isolated per-tenant to prevent cross-tenant data exposure

Alternative: Full Isolation

It's worth noting that MotherDuck's approach to multi-tenancy is full isolation - separate Ducklings per tenant. This sidesteps all the complexity above but requires more infrastructure. For some use cases (like entities moving between tenants), shared infrastructure with proper validation may still be preferable.

Happy to submit a PR if this approach is acceptable.

Environment

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions