Skip to content

fix: strip trailing slash from FilesystemClient.dataset_path and get_table_dir#3867

Open
mattiasthalen wants to merge 5 commits intodlt-hub:develfrom
mattiasthalen:fix/3866-filesystem-trailing-slash-onelake
Open

fix: strip trailing slash from FilesystemClient.dataset_path and get_table_dir#3867
mattiasthalen wants to merge 5 commits intodlt-hub:develfrom
mattiasthalen:fix/3866-filesystem-trailing-slash-onelake

Conversation

@mattiasthalen
Copy link
Copy Markdown
Contributor

Description

FilesystemClient.dataset_path and FilesystemClient.get_table_dir both return paths ending with a trailing separator. This is benign on most backends (404 from BlobClient.exists, normalized to False), but OneLake (Microsoft Fabric) returns 403 ClientAuthenticationError for the same request, which fatally kills the load at initialize_storage and truncate_tables.

This PR strips the trailing separator from both methods and adds regression tests that assert neither returned path ends with /.

Related Issues

Additional Context

OneLake (Microsoft Fabric) responds with 403 ClientAuthenticationError
when BlobClient.exists targets a blob name ending in /. That kills
FilesystemClient.initialize_storage at the very first fs.isdir call
on self.dataset_path. Non-OneLake backends silently treat it as False
and hit the same latent defect, just non-fatally. Strip the empty
segment from the pathlib.join so dataset_path never ends in /.

Refs dlt-hub#3866
@mattiasthalen mattiasthalen marked this pull request as ready for review April 15, 2026 11:28
Black wants the multi-line assert in test_dataset_path_has_no_trailing_separator
reformatted into a single-line assert. Apply the formatter's output so
`make format-check` passes in CI.

Refs dlt-hub#3866
@mattiasthalen mattiasthalen marked this pull request as draft April 15, 2026 11:35
Same OneLake 403 root cause as the previous commit on dataset_path,
one level deeper. FilesystemClient.truncate_tables calls
fs.exists(table_dir) for each entry from get_table_dirs(...), which
on OneLake 403s on every table once dataset_path is already fixed.
Drop the trailing pathlib.sep so get_table_dir returns a path shape
that BlobClient.exists accepts.

Refs dlt-hub#3866
…nt paths

Tasks 2 and 3 of this PR (dlt-hub#3867) stripped the trailing separator from
`FilesystemClient.dataset_path` and `FilesystemClient.get_table_dir`.
The pre-existing `test_trailing_separators` hardcoded the old shape
(trailing /) in its parameterized assertions. Flip those seven
assertions to the corrected shape.

Also drop the stale "ending with separator" phrase from
`get_table_dir`'s docstring — same invariant flip, land together.

`get_table_prefix` is untouched and still preserves its trailing
separator for folder-style layouts; the two assertions on that
method stay as-is.

Refs dlt-hub#3866
Task 2 of this PR (dlt-hub#3867) stripped the trailing separator from
`FilesystemClient.dataset_path`. The pre-existing
`test_destination_config_in_name` assertion at line 218 was
`endswith(dataset_name + pathlib.sep)`, which encoded the old shape.
Replace with `endswith(dataset_name)` and drop the now-unused
`pathlib` local variable (and its `type: ignore` comment).

Caught by `make test-common-p`, not surfaced by the filesystem
test module run in Task 4 because this test lives under
`tests/destinations/`.

Refs dlt-hub#3866
@mattiasthalen
Copy link
Copy Markdown
Contributor Author

Verification summary

All local verification complete on fix/3866-filesystem-trailing-slash-onelake @ b5aedc02:

  • tests/load/filesystem/test_filesystem_client.py: all tests green, including 2 new regression tests (test_dataset_path_has_no_trailing_separator, test_get_table_dir_has_no_trailing_separator) and flipped test_trailing_separators expectations.
  • make test-common-p: 5571 passed, 60 skipped, 1 env failure (test_all_examples[load_github] — GitHub API rate limit, unrelated).
  • make test-load-local-p (duckdb + filesystem, memory + file drivers, sqlalchemy 2.0.18): 1244 passed, 194 skipped, 1 env error (test_stage_loading.pylibodbc.so.2 missing in devcontainer, unrelated).
  • make format: clean.
  • make lint: exit 0 (mypy, ruff, flake8, bandit, docstrings, lockfile, deps all clean).

Ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

FilesystemClient trailing slash causes OneLake 403 at initialize_storage and truncate_tables

2 participants