Skip to content

feat(tests): file destination contract tests (Step 2a of #364 follow-up)#594

Merged
masukai merged 1 commit into
mainfrom
feat/destination-contract-tests-file-empty-batch
May 31, 2026
Merged

feat(tests): file destination contract tests (Step 2a of #364 follow-up)#594
masukai merged 1 commit into
mainfrom
feat/destination-contract-tests-file-empty-batch

Conversation

@masukai

@masukai masukai commented May 31, 2026

Copy link
Copy Markdown
Contributor

Summary

Step 2a of the destination contract framework started in #593. Mirrors the HTTP suite for destinations that write to disk — same three universal invariants, same parametrised shape; only the "no side effects on empty input" check differs (filesystem snapshot vs HTTP request log).

What ships

  • tests/contracts/test_destination_file_empty_batch.py — new
  • 9 tests in CI's minimal install: FileDestination in CSV / JSON / JSONL modes × 3 contracts
  • +3 conditional tests for ParquetDestination when [parquet] extras are present (pandas + pyarrow)

The three contracts (same as #593)

  1. isinstance(dest, Destination) — Protocol satisfaction.
  2. load([]) returns SyncResult(success=0, failed=0, skipped=0).
  3. load([]) leaves the filesystem untouched — no new files in tmp_path, not even a 0-byte placeholder. A 0-byte output would still indicate the destination opened a file handle before checking the record count — exactly the bug this contract catches.
pre_state = set(tmp_path.iterdir())
dest.load([], config, opts)
post_state = set(tmp_path.iterdir())

new_paths = post_state - pre_state
assert new_paths == set(), (
    f"{destination_class.__name__} created {len(new_paths)} file(s) "
    f"on empty batch: {sorted(p.name for p in new_paths)}; "
    "destinations must short-circuit when there's nothing to write."
)

The conditional pattern (for Step 2b prep)

The parquet entry shows the pattern that the SQL suite will use for [bigquery] / [snowflake] / etc.:

try:
    import pandas as _pd  # noqa: F401
    import pyarrow as _pa  # noqa: F401
    from drt.destinations.parquet import ParquetDestination
    from drt.config.models import ParquetDestinationConfig

    FILE_DESTINATIONS.append(
        pytest.param(ParquetDestination, lambda p: ..., id="parquet"),
    )
except ImportError:
    pass

CI's minimal install ([dev,mcp,duckdb]) skips it; a maintainer with [parquet] installed sees the extra coverage.

Verification

Why "no 0-byte file" matters

Several destinations call os.makedirs(...) or open a file handle as the first I/O step inside load(). If they don't short-circuit on empty input first, you'd see a 0-byte CSV (or worse, a header-only one) appear on disk for what should be a no-op. Engine sync windows with no rows would silently corrupt downstream pipelines that watch for any new file. The current FileDestination / ParquetDestination implementations do short-circuit (if not records: return SyncResult()) — this PR locks that behaviour so it can't regress.

What's deferred (Step 2b)

  • SQL destinations (Postgres / MySQL / ClickHouse / Snowflake) — each uses a different DB driver (psycopg2 / pymysql / clickhouse-connect / snowflake-connector-python), so a single in-memory substitute isn't viable. The path is per-destination connection mocking via unittest.mock. Own PR.
  • StagedDestination Protocol (Salesforce Bulk, Amazon Marketing Cloud) — different load shape (stage + finalize) so different contract surface.

🤖 Generated with Claude Code

…low-up)

Mirrors PR #593's HTTP destination contract suite for destinations
that write to disk. Same three universal invariants, same parametrised
shape — only the "no side effects on empty input" check differs (a
directory snapshot instead of an httpserver request log).

What ships:

- tests/contracts/test_destination_file_empty_batch.py
- Parametrised over FileDestination in CSV / JSON / JSONL modes
  (9 tests = 3 formats × 3 contracts).
- Parquet destination is appended conditionally — guarded by a
  `try / import pandas, pyarrow / append` block so the suite still
  collects when the [parquet] extras aren't installed. Adds 3 more
  tests when those deps are present. Demonstrates the extension
  pattern for destinations that depend on optional extras (relevant
  for the SQL suite in Step 2b: bigquery / snowflake / etc.).

The three contracts:

1. isinstance(dest, Destination) — Protocol satisfaction.
2. load([]) returns SyncResult(success=0, failed=0, skipped=0).
3. load([]) leaves the filesystem untouched — no new files in tmp_path,
   not even a 0-byte placeholder. A 0-byte output would still indicate
   the destination opened a file handle before checking the record
   count, which is the bug the contract is designed to catch.

Verified: pytest tests/contracts/ → 21 passed (12 from #593 HTTP +
9 from this PR file). make lint clean (115 source files).

The Step 2b SQL suite (Postgres / MySQL / ClickHouse / Snowflake)
needs a DB mock harness and is deferred to its own PR — those
destinations use psycopg2 / pymysql / clickhouse-connect specifically,
so a single in-memory substitute isn't viable. Likely path is
per-destination connection mocking via unittest.mock.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@codecov

codecov Bot commented May 31, 2026

Copy link
Copy Markdown

Codecov Report

✅ All modified and coverable lines are covered by tests.

📢 Thoughts on this report? Let us know!

@masukai masukai merged commit 7f86c83 into main May 31, 2026
8 checks passed
@masukai masukai deleted the feat/destination-contract-tests-file-empty-batch branch May 31, 2026 13:35
@github-actions github-actions Bot locked and limited conversation to collaborators May 31, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant