duckdb-server-go: --function-blocklist flag has no effect and exec command bypasses schema validation

## Summary

While investigating multi-tenant security for duckdb-server-go (following up on #817), I found two bugs and an additional security consideration:

1. **`--function-blocklist` flag is not wired up** - The flag is parsed and the validator exists and is tested, but it's never actually applied to queries
2. **`exec` command bypasses all validation** - Schema validation via `--schema-match-headers` only applies to `arrow` and `json` commands, not `exec`
3. **Shared preagg schema exposes cross-tenant data** - If multiple tenants share the default `mosaic` preagg schema, aggregated data leaks between tenants

## Issue 1: Function blocklist is dead code

The `--function-blocklist` flag is parsed in `main.go:40-42` and passed to options via `query.WithFunctionBlocklist()` at line 89:

```go
// main.go:40-42
var functionBlocklist []string
if *functionBlocklistStr != "" {
    functionBlocklist = strings.Split(*functionBlocklistStr, ",")
}

// main.go:89
query.WithFunctionBlocklist(functionBlocklist),
```

The option is stored in `Options.FunctionBlocklist` (`options.go:26`), and a working validator exists (`validator.go:208-259`).

**However**, `query.New()` never reads `FunctionBlocklist` from the options, and the validator is never instantiated in production code.

The tests pass because they create the validator directly:
```go
// validator_test.go:213-214
if len(tt.functionBlocklist) > 0 {
    validators = append(validators, newFunctionBlocklistValidator(tt.functionBlocklist))
}
```

**Impact**: Users who set `--function-blocklist "read_parquet,iceberg_scan"` expecting to block these functions are not protected.

## Issue 2: `exec` command bypasses schema validation

In `server.go:260-268`, schema validation is only applied to `arrow` and `json` commands:

```go
switch *params.Type {
case CommandExec:
    return nil, false, s.db.Exec(ctx, *params.SQL)  // No allowedSchemas!

case CommandArrow:
    return s.db.QueryArrow(ctx, *params.SQL, allowedSchemas, useCache)

case CommandJSON:
    return s.db.QueryJSON(ctx, *params.SQL, allowedSchemas, useCache)
}
```

And in `query.go:186-193`, `Exec()` runs queries directly without any validation:

```go
func (db *DB) Exec(ctx context.Context, query string) error {
    _, err := db.db.ExecContext(ctx, query)  // No ValidateSQL call
    if err != nil {
        return fmt.Errorf("query: failed to execute query: %w", err)
    }
    return nil
}
```

**Impact**: A malicious client can bypass schema isolation. Since `exec` doesn't return results, the attack vectors are data exfiltration via COPY/INSERT and destructive operations:

**Data exfiltration via INSERT:**
```json
{"type": "exec", "sql": "INSERT INTO my_tenant.stolen SELECT * FROM other_tenant.sensitive_data"}
```

**Data exfiltration via COPY:**
```json
{"type": "exec", "sql": "COPY (SELECT * FROM other_tenant.sensitive_data) TO '/tmp/exfil.parquet'"}
```

**Destructive attacks:**
```json
{"type": "exec", "sql": "DROP TABLE other_tenant.data"}
{"type": "exec", "sql": "DELETE FROM other_tenant.bookings WHERE 1=1"}
{"type": "exec", "sql": "UPDATE other_tenant.bookings SET revenue = 0"}
```

## Issue 3: Shared preagg schema exposes cross-tenant data

Even with schema validation fixed for `exec`, there's an additional multi-tenant concern with Mosaic's pre-aggregation feature.

Mosaic creates materialized tables in the configured `preagg.schema` (default: `mosaic`):

```sql
CREATE TABLE IF NOT EXISTS mosaic.preagg_<hash> AS
  SELECT client_key, sum(revenue), count(*)
  FROM client_abc.bookings
  GROUP BY ...
```

**If multiple tenants share the same `mosaic` schema:**
- Tenant A's preagg tables contain their aggregated data (sums, counts, averages)
- Tenant B could query or DROP Tenant A's preagg tables
- Even aggregated data may be sensitive (total revenue, booking counts, etc.)

**Impact**: Cross-tenant data exposure via shared preagg tables, even when source data schemas are properly isolated.

## Suggested Fix

1. **Wire up function blocklist**: Store `FunctionBlocklist` in the `DB` struct and create the validator in `WriteJSON`/`WriteArrow`

2. **Apply validation to `Exec`**: Pass `allowedSchemas` to `Exec()` and call `ValidateSQL()` with the same validators

3. **Per-tenant preagg schemas**: For multi-tenant deployments, the Mosaic coordinator should be configured with a per-tenant `preagg.schema` (e.g., `mosaic_client_abc` instead of shared `mosaic`), and this schema should be included in the `--schema-match-headers` allowed list alongside the data schema

4. **Documentation**: The multi-tenant security model should be documented, making clear that:
   - `--schema-match-headers` only protects `arrow`/`json` commands currently
   - `exec` requires additional protection for full isolation
   - Preagg schemas must be isolated per-tenant to prevent cross-tenant data exposure

## Alternative: Full Isolation

It's worth noting that MotherDuck's approach to multi-tenancy is full isolation - separate Ducklings per tenant. This sidesteps all the complexity above but requires more infrastructure. For some use cases (like entities moving between tenants), shared infrastructure with proper validation may still be preferable.

Happy to submit a PR if this approach is acceptable.

## Environment

- duckdb-server-go from mosaic main branch
- Related discussion: #817

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

duckdb-server-go: --function-blocklist flag has no effect and exec command bypasses schema validation #958

Summary

Issue 1: Function blocklist is dead code

Issue 2: `exec` command bypasses schema validation

Issue 3: Shared preagg schema exposes cross-tenant data

Suggested Fix

Alternative: Full Isolation

Environment

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Uh oh!

duckdb-server-go: --function-blocklist flag has no effect and exec command bypasses schema validation #958

Description

Summary

Issue 1: Function blocklist is dead code

Issue 2: exec command bypasses schema validation

Issue 3: Shared preagg schema exposes cross-tenant data

Suggested Fix

Alternative: Full Isolation

Environment

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Issue 2: `exec` command bypasses schema validation