Overview
Implement automatic schema extraction from Parquet file metadata, eliminating the need for explicit schema parameters.
Current Behavior
The emulator requires an explicit schema parameter when loading Parquet files.
Location: server/handler.go:1073-1096
case "PARQUET":
reader := parquet.NewReader(bytes.NewReader(b))
// Requires schema to be provided externally
Expected Behavior
BigQuery automatically extracts schema from "self-describing formats" like Parquet. No --autodetect flag or explicit schema is needed - the schema is read directly from the Parquet file metadata.
Implementation Requirements
- Extract schema from Parquet file metadata when no explicit schema is provided
- Map Parquet types to BigQuery types:
- Primitive types (INT32, INT64, FLOAT, DOUBLE, BOOLEAN, BYTE_ARRAY)
- Complex types (nested structs → RECORD, arrays → REPEATED fields)
- Handle nested structures appropriately
- Preserve field names, nullability, and repetition information
Test Cases
- Load Parquet file without schema parameter → should auto-detect from file metadata
- Parquet with complex types (nested structs, arrays) → should create appropriate RECORD fields
- Parquet with all primitive types → should map correctly to BigQuery types
- Verify schema matches what BigQuery would generate
Documentation Reference
Overview
Implement automatic schema extraction from Parquet file metadata, eliminating the need for explicit schema parameters.
Current Behavior
The emulator requires an explicit schema parameter when loading Parquet files.
Location:
server/handler.go:1073-1096Expected Behavior
BigQuery automatically extracts schema from "self-describing formats" like Parquet. No
--autodetectflag or explicit schema is needed - the schema is read directly from the Parquet file metadata.Implementation Requirements
Test Cases
Documentation Reference