dlt version
1.25.0 (current devel, verified against commit 9636de5e)
Describe the problem
FilesystemClient.dataset_path and FilesystemClient.get_table_dir both return paths with a trailing separator. This is benign on most backends — BlobClient.exists responds with 404 ResourceNotFoundError for a blob name ending in /, and adlfs._exists normalizes that to False. On OneLake (Microsoft Fabric), the same request returns 403 ClientAuthenticationError, which bubbles all the way up and kills the load twice:
FilesystemClient.initialize_storage calls self.fs_client.isdir(self.dataset_path) as the first step, before any data is written — fatal 403 on the trailing-slash dataset_path.
FilesystemClient.truncate_tables iterates get_table_dirs(...) and calls self.fs_client.exists(table_dir) per table — fatal 403 on each trailing-separator table_dir.
Source lines on devel @ 9636de5e:
dlt/destinations/impl/filesystem/filesystem.py:591 — return self.pathlib.join(self.bucket_path, self.dataset_name, "") forces a trailing slash.
dlt/destinations/impl/filesystem/filesystem.py:882 — table_dir: str = self.pathlib.dirname(table_prefix) + self.pathlib.sep forces a trailing separator.
Both are latent defects in generic FilesystemClient. OneLake just makes them visibly fatal by responding 403 instead of 404.
Expected behavior
dataset_path and get_table_dir return paths without a trailing separator. All downstream consumers that need to append further segments already do so via pathlib.join(...), where whether the parent ends in / is irrelevant.
Proposed fix, two lines:
# FilesystemClient.dataset_path
return self.pathlib.join(self.bucket_path, self.dataset_name)
# FilesystemClient.get_table_dir
table_dir: str = self.pathlib.dirname(table_prefix)
Regression tests should assert neither returned path ends with /, using an in-memory filesystem fixture so no real Azure/OneLake credentials are needed.
Steps to reproduce
Minimal shape-only reproduction using the same adlfs.AzureBlobFileSystem instance against OneLake:
# Against OneLake, same fs instance:
fs.isdir("<ws-guid>/<lh-guid>/Files/_dlt_stage/demo/") # → raises 403 ClientAuthenticationError
fs.isdir("<ws-guid>/<lh-guid>/Files/_dlt_stage/demo") # → True
End-to-end repro using a Fabric notebook kernel (requires Fabric tenant with Warehouse + Lakehouse grants as the interactive user):
- Configure
DESTINATION__FILESYSTEM__BUCKET_URL pointing to abfss://<ws-guid>@onelake.dfs.fabric.microsoft.com/<lh-guid>/Files/_dlt_stage.
- Configure a dlt pipeline with any destination that uses
staging=\"filesystem\" (e.g. fabric).
pipeline.run(...) — fails at initialize_storage with 403 ClientAuthenticationError on the fs.isdir(self.dataset_path) call, before any data is uploaded.
Operating system
Linux
Runtime environment
Other
Python version
3.11
dlt data source
N/A — this is a FilesystemClient path-construction bug triggered by any resource. Repros with a minimal static resource.
dlt destination
Filesystem & buckets
Other deployment details
Microsoft Fabric Python notebook kernel, OneLake-backed filesystem staging, Fabric Warehouse as the primary destination. The filesystem destination ships a custom OnelakeFileSystem (in fsspec_wrapper.trident.core) registered as the abfss:// handler, which is the code path that surfaces the 403 instead of a 404. Non-OneLake Azure Blob users still hit the latent bug but observe it as silent False from isdir/exists.
Additional information
Surfaced as part of OSS-41 (Fabric notebook user-identity auth). Splitting this bugfix out as its own ticket per CONTRIBUTING — fix/ branches require a ticket, and this is a pure bug with zero API surface that can land independently of the Fabric feature work. The Fabric feature PR will rebase on this once merged.
I have a validated reproduction and fix in a personal sandbox — happy to open the PR against devel with regression tests.
dlt version
1.25.0(currentdevel, verified against commit9636de5e)Describe the problem
FilesystemClient.dataset_pathandFilesystemClient.get_table_dirboth return paths with a trailing separator. This is benign on most backends —BlobClient.existsresponds with404 ResourceNotFoundErrorfor a blob name ending in/, andadlfs._existsnormalizes that toFalse. On OneLake (Microsoft Fabric), the same request returns403 ClientAuthenticationError, which bubbles all the way up and kills the load twice:FilesystemClient.initialize_storagecallsself.fs_client.isdir(self.dataset_path)as the first step, before any data is written — fatal 403 on the trailing-slashdataset_path.FilesystemClient.truncate_tablesiteratesget_table_dirs(...)and callsself.fs_client.exists(table_dir)per table — fatal 403 on each trailing-separatortable_dir.Source lines on
devel@9636de5e:dlt/destinations/impl/filesystem/filesystem.py:591—return self.pathlib.join(self.bucket_path, self.dataset_name, "")forces a trailing slash.dlt/destinations/impl/filesystem/filesystem.py:882—table_dir: str = self.pathlib.dirname(table_prefix) + self.pathlib.sepforces a trailing separator.Both are latent defects in generic
FilesystemClient. OneLake just makes them visibly fatal by responding403instead of404.Expected behavior
dataset_pathandget_table_dirreturn paths without a trailing separator. All downstream consumers that need to append further segments already do so viapathlib.join(...), where whether the parent ends in/is irrelevant.Proposed fix, two lines:
Regression tests should assert neither returned path ends with
/, using an in-memory filesystem fixture so no real Azure/OneLake credentials are needed.Steps to reproduce
Minimal shape-only reproduction using the same
adlfs.AzureBlobFileSysteminstance against OneLake:End-to-end repro using a Fabric notebook kernel (requires Fabric tenant with Warehouse + Lakehouse grants as the interactive user):
DESTINATION__FILESYSTEM__BUCKET_URLpointing toabfss://<ws-guid>@onelake.dfs.fabric.microsoft.com/<lh-guid>/Files/_dlt_stage.staging=\"filesystem\"(e.g.fabric).pipeline.run(...)— fails atinitialize_storagewith403 ClientAuthenticationErroron thefs.isdir(self.dataset_path)call, before any data is uploaded.Operating system
Linux
Runtime environment
Other
Python version
3.11
dlt data source
N/A — this is a
FilesystemClientpath-construction bug triggered by any resource. Repros with a minimal static resource.dlt destination
Filesystem & buckets
Other deployment details
Microsoft Fabric Python notebook kernel, OneLake-backed filesystem staging, Fabric Warehouse as the primary destination. The filesystem destination ships a custom
OnelakeFileSystem(infsspec_wrapper.trident.core) registered as theabfss://handler, which is the code path that surfaces the 403 instead of a 404. Non-OneLake Azure Blob users still hit the latent bug but observe it as silentFalsefromisdir/exists.Additional information
Surfaced as part of OSS-41 (Fabric notebook user-identity auth). Splitting this bugfix out as its own ticket per CONTRIBUTING —
fix/branches require a ticket, and this is a pure bug with zero API surface that can land independently of the Fabric feature work. The Fabric feature PR will rebase on this once merged.I have a validated reproduction and fix in a personal sandbox — happy to open the PR against
develwith regression tests.