Skip to content

Implement intelligent CSV header detection #64

@ohaibbq

Description

@ohaibbq

Overview

Implement intelligent header detection that matches BigQuery's behavior instead of always treating row 0 as the header.

Current Behavior

The emulator always treats row 0 as the header, regardless of the content.

Location: server/handler.go:975-981

Expected Behavior

BigQuery uses intelligent detection: "If the first line contains only strings, and the other lines contain other data types, BigQuery assumes that the first row is a header row."

If the first row does not meet this criteria, BigQuery treats it as data and generates column names (e.g., col_0, col_1, etc.).

Implementation Requirements

  1. Check if row 0 contains only string-like values (no obvious numbers, dates, etc.)
  2. Check if row 1+ contains non-string types (numbers, dates, timestamps, etc.)
  3. If both conditions are met → row 0 is header
  4. Otherwise → row 0 is data, generate column names like col_0, col_1, col_2, etc.

Test Cases

All-numeric CSV (no header):

1,2,3
4,5,6

→ Row 0 should be data, not header. Columns should be named col_0, col_1, col_2

String header + numeric data:

name,age,score
Alice,25,95.5

→ Row 0 should be header

All-string CSV:

Alice,Bob,Charlie
Dave,Eve,Frank

→ Ambiguous case - may need additional heuristics or user configuration

Documentation Reference

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions