Skip to content

FilesystemClient trailing slash causes OneLake 403 at initialize_storage and truncate_tables #3866

@mattiasthalen

Description

@mattiasthalen

dlt version

1.25.0 (current devel, verified against commit 9636de5e)

Describe the problem

FilesystemClient.dataset_path and FilesystemClient.get_table_dir both return paths with a trailing separator. This is benign on most backends — BlobClient.exists responds with 404 ResourceNotFoundError for a blob name ending in /, and adlfs._exists normalizes that to False. On OneLake (Microsoft Fabric), the same request returns 403 ClientAuthenticationError, which bubbles all the way up and kills the load twice:

  1. FilesystemClient.initialize_storage calls self.fs_client.isdir(self.dataset_path) as the first step, before any data is written — fatal 403 on the trailing-slash dataset_path.
  2. FilesystemClient.truncate_tables iterates get_table_dirs(...) and calls self.fs_client.exists(table_dir) per table — fatal 403 on each trailing-separator table_dir.

Source lines on devel @ 9636de5e:

  • dlt/destinations/impl/filesystem/filesystem.py:591return self.pathlib.join(self.bucket_path, self.dataset_name, "") forces a trailing slash.
  • dlt/destinations/impl/filesystem/filesystem.py:882table_dir: str = self.pathlib.dirname(table_prefix) + self.pathlib.sep forces a trailing separator.

Both are latent defects in generic FilesystemClient. OneLake just makes them visibly fatal by responding 403 instead of 404.

Expected behavior

dataset_path and get_table_dir return paths without a trailing separator. All downstream consumers that need to append further segments already do so via pathlib.join(...), where whether the parent ends in / is irrelevant.

Proposed fix, two lines:

# FilesystemClient.dataset_path
return self.pathlib.join(self.bucket_path, self.dataset_name)

# FilesystemClient.get_table_dir
table_dir: str = self.pathlib.dirname(table_prefix)

Regression tests should assert neither returned path ends with /, using an in-memory filesystem fixture so no real Azure/OneLake credentials are needed.

Steps to reproduce

Minimal shape-only reproduction using the same adlfs.AzureBlobFileSystem instance against OneLake:

# Against OneLake, same fs instance:
fs.isdir("<ws-guid>/<lh-guid>/Files/_dlt_stage/demo/")  # → raises 403 ClientAuthenticationError
fs.isdir("<ws-guid>/<lh-guid>/Files/_dlt_stage/demo")   # → True

End-to-end repro using a Fabric notebook kernel (requires Fabric tenant with Warehouse + Lakehouse grants as the interactive user):

  1. Configure DESTINATION__FILESYSTEM__BUCKET_URL pointing to abfss://<ws-guid>@onelake.dfs.fabric.microsoft.com/<lh-guid>/Files/_dlt_stage.
  2. Configure a dlt pipeline with any destination that uses staging=\"filesystem\" (e.g. fabric).
  3. pipeline.run(...) — fails at initialize_storage with 403 ClientAuthenticationError on the fs.isdir(self.dataset_path) call, before any data is uploaded.

Operating system

Linux

Runtime environment

Other

Python version

3.11

dlt data source

N/A — this is a FilesystemClient path-construction bug triggered by any resource. Repros with a minimal static resource.

dlt destination

Filesystem & buckets

Other deployment details

Microsoft Fabric Python notebook kernel, OneLake-backed filesystem staging, Fabric Warehouse as the primary destination. The filesystem destination ships a custom OnelakeFileSystem (in fsspec_wrapper.trident.core) registered as the abfss:// handler, which is the code path that surfaces the 403 instead of a 404. Non-OneLake Azure Blob users still hit the latent bug but observe it as silent False from isdir/exists.

Additional information

Surfaced as part of OSS-41 (Fabric notebook user-identity auth). Splitting this bugfix out as its own ticket per CONTRIBUTING — fix/ branches require a ticket, and this is a pure bug with zero API surface that can land independently of the Fabric feature work. The Fabric feature PR will rebase on this once merged.

I have a validated reproduction and fix in a personal sandbox — happy to open the PR against devel with regression tests.

Metadata

Metadata

Assignees

Labels

destinationIssue with a specific destination

Type

No type

Projects

Status

In Progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions