Feature description
Make the fabric destination usable from inside a Microsoft Fabric Python notebook under an interactive user identity — i.e. the case where notebookutils.credentials.getToken(...) is the canonical way to obtain AAD tokens for both the Fabric Warehouse TDS endpoint and OneLake staging, and no Service Principal is (or should be) available.
Today, dlt[fabric]==1.24.0 hardcodes Service Principal authentication on both the warehouse side and the staging side, plus two unrelated trailing-slash defects in FilesystemClient cause OneLake-backed staging to return 403 ClientAuthenticationError (instead of 404 ResourceNotFoundError) during initialize_storage. Together these make the destination unreachable from a notebook without either a proper Service Principal or a non-trivial pile of runtime monkey patches.
I have a working end-to-end fix validated against a live Fabric tenant with a two-run scd2 smoke test, and I'm opening one PR to land it.
Are you a dlt user?
Yes, I use dlt in a Microsoft Fabric Python notebook to load data into a Fabric Warehouse with OneLake-backed staging.
Use case
Running dlt.pipeline(destination="fabric", staging="filesystem").run(...) from inside a Fabric notebook is the natural entry point for many Fabric users — notebooks are how Fabric exposes interactive Python development, and they come with a blessed auth path via notebookutils.credentials.getToken(...). A Fabric user already has a workspace identity and Warehouse/Lakehouse grants tied to that identity; asking them to additionally provision a Service Principal, grant it roles, and store its secret just to use dlt is a high bar that they will normally not clear.
The blocking failure chain I hit in this environment, in the order each problem fires during a single pipeline.run(...):
-
FabricCredentials.get_odbc_dsn_dict (dlt/destinations/impl/fabric/configuration.py:55-72) unconditionally emits AUTHENTICATION=ActiveDirectoryServicePrincipal and reads UID/PWD from azure_client_id/azure_client_secret. There is no code path for passing a pre-fetched AAD bearer token to pyodbc via SQL_COPT_SS_ACCESS_TOKEN (1256). The on_partial fallback to DefaultAzureCredential is also unusable in this environment — Microsoft Learn explicitly states that DefaultAzureCredential is not supported inside Fabric notebooks, and nothing plumbs a TokenCredential token into pyodbc.connect anyway because FabricSqlClient inherits PyOdbcMsSqlClient.open_connection, which calls pyodbc.connect(dsn, timeout=...) with no attrs_before hook.
-
FabricCopyFileLoadJob._ensure_fabric_token_initialized (dlt/destinations/impl/fabric/fabric.py:62-139) is triggered whenever staging_credentials is an AzureServicePrincipalCredentialsWithoutDefaults instance — which the config resolver picks any time the azure_* fields are populated, even with dummy strings. The helper builds a real ClientSecretCredential(tenant_id, client_id, client_secret) from those fields and hits https://api.fabric.microsoft.com/.default before every load, failing with ClientAuthenticationError: 'Forbidden' long before any data moves. The helper's own comment documents that it is a Fabric SP rate-limiting workaround, which is not needed under interactive user identity — the subsequent COPY INTO for OneLake does not emit a CREDENTIAL clause and relies on the Fabric Warehouse's workspace identity.
-
OneLake staging ignores Fabric's built-in credential provider when any credential is supplied (dlt/common/configuration/specs/azure_credentials.py:118-125, 162-187 + dlt/common/storages/fsspec_filesystem.py:95, 191). Fabric notebooks ship a custom OnelakeFileSystem (in fsspec_wrapper.trident.core) that is registered as the abfss:// protocol handler. Its __init__ calls a built-in make_credential() helper when no credential kwarg is supplied, producing a PandasCredential tied to the notebook user identity. This is the supported OneLake auth path for interactive execution. dlt's default AzureServicePrincipalCredentials.to_adlfs_credentials returns SP kwargs that authenticate as a broken SP in this case; passing an explicit TokenCredential via credential= works in direct adlfs preflight calls but still returns 403 inside dlt's full load path. The only shape that works end-to-end is to omit credential entirely and let OnelakeFileSystem.make_credential() run.
-
FilesystemClient.dataset_path trailing slash (dlt/destinations/impl/filesystem/filesystem.py:587-591) forces the result to end in / via self.pathlib.join(self.bucket_path, self.dataset_name, ""). FilesystemClient.initialize_storage then calls self.fs_client.isdir(self.dataset_path), which reaches adlfs._exists → BlobClient.exists(version_id=...) for a blob name ending in /. OneLake returns 403 ClientAuthenticationError for the invalid shape (other Azure Blob endpoints return 404), and the load fails at step=load. Reproduction, same fs instance and same path minus one character: fs.isdir("<ws>/<lh>/Files/_dlt_stage/demo/") → 403, fs.isdir("<ws>/<lh>/Files/_dlt_stage/demo") → True.
-
FilesystemClient.get_table_dir trailing slash (dlt/destinations/impl/filesystem/filesystem.py:874-883) has the same bug one level deeper: table_dir = self.pathlib.dirname(table_prefix) + self.pathlib.sep. FilesystemClient.truncate_tables (line 721) iterates these paths and calls self.fs_client.exists(table_dir), hitting the same 403 on every table once dataset_path is fixed.
Problems (4) and (5) are latent defects in FilesystemClient triggerable outside the Fabric case on any backend where BlobClient.exists distinguishes foo/ from foo on the existence path — Fabric/OneLake just makes them visibly fatal because it responds 403 instead of 404.
Proposed solution
One PR that addresses all five problems together:
-
Fabric warehouse token auth. Add access_token: Optional[TSecretStrValue] to FabricCredentials. When set, get_odbc_dsn_dict returns only DRIVER/SERVER/DATABASE/Encrypt/TrustServerCertificate/LongAsMax (no AUTHENTICATION/UID/PWD), and FabricSqlClient.open_connection is overridden to pack the token into the little-endian UTF-16 struct the ODBC driver expects and pass it via pyodbc.connect(dsn, attrs_before={1256: token_struct}, timeout=...). Inherited datetimeoffset output converter and autocommit behavior are preserved. SP path is untouched when no access_token is set.
-
Gate the Fabric SP session warmup. Change FabricCopyFileLoadJob._ensure_fabric_token_initialized to short-circuit when the credentials object lacks a real SP secret (e.g. credentials.azure_client_secret is None / empty, or the token-auth mode from (1) is active). Interactive identity loads skip the warmup and rely on the workspace identity for OneLake reads, the way COPY INTO already expects.
-
Notebook-identity OneLake staging. Teach AzureCredentialsWithoutDefaults.to_adlfs_credentials and AzureServicePrincipalCredentialsWithoutDefaults.to_adlfs_credentials to return only {account_name, account_host} (no credential, no SP fields) when the configuration is in notebook-identity mode and the bucket host is a *.fabric.microsoft.com endpoint. With no credential kwarg, the Fabric-registered OnelakeFileSystem.__init__ falls through to its make_credential() helper and uses the notebook user identity for every OneLake request.
-
Strip trailing slash from FilesystemClient.dataset_path. Change the property to self.pathlib.join(self.bucket_path, self.dataset_name). All other consumers use pathlib.join(dataset_path, ...) to append further segments, where a trailing slash is irrelevant.
-
Strip trailing slash from FilesystemClient.get_table_dir. Change table_dir = self.pathlib.dirname(table_prefix) + self.pathlib.sep to table_dir = self.pathlib.dirname(table_prefix). Same reasoning as (4).
Validation accompanying the PR:
- 30 mocked pytest cases exercising every patched seam under fake
pyodbc/notebookutils/jwt/adlfs — covering the struct layout for SQL_COPT_SS_ACCESS_TOKEN, the DSN fields, the to_adlfs_credentials output for all four azure credentials classes in the hierarchy (both with- and without-defaults variants), the dataset_path/get_table_dir shapes, and the warmup no-op.
- Full end-to-end run against a live Fabric tenant:
dlt.pipeline(destination="fabric", staging="filesystem") with merge/scd2 disposition, two sequential runs, verification query returns the expected three-row history (id=1 unchanged active, id=2 original version closed at t2, new active version from t2).
- Validated against
devel at commit fa1e403ef3368fca80354085e55e86e11b514e8e.
Related issues
None found in a search of the dlt-hub/dlt tracker for "Fabric notebook", "OneLake 403", "SQL_COPT_SS_ACCESS_TOKEN", or "dataset_path trailing". Happy to cross-link anything maintainers point at.
Feature description
Make the
fabricdestination usable from inside a Microsoft Fabric Python notebook under an interactive user identity — i.e. the case wherenotebookutils.credentials.getToken(...)is the canonical way to obtain AAD tokens for both the Fabric Warehouse TDS endpoint and OneLake staging, and no Service Principal is (or should be) available.Today,
dlt[fabric]==1.24.0hardcodes Service Principal authentication on both the warehouse side and the staging side, plus two unrelated trailing-slash defects inFilesystemClientcause OneLake-backed staging to return403 ClientAuthenticationError(instead of404 ResourceNotFoundError) duringinitialize_storage. Together these make the destination unreachable from a notebook without either a proper Service Principal or a non-trivial pile of runtime monkey patches.I have a working end-to-end fix validated against a live Fabric tenant with a two-run
scd2smoke test, and I'm opening one PR to land it.Are you a dlt user?
Yes, I use dlt in a Microsoft Fabric Python notebook to load data into a Fabric Warehouse with OneLake-backed staging.
Use case
Running
dlt.pipeline(destination="fabric", staging="filesystem").run(...)from inside a Fabric notebook is the natural entry point for many Fabric users — notebooks are how Fabric exposes interactive Python development, and they come with a blessed auth path vianotebookutils.credentials.getToken(...). A Fabric user already has a workspace identity and Warehouse/Lakehouse grants tied to that identity; asking them to additionally provision a Service Principal, grant it roles, and store its secret just to use dlt is a high bar that they will normally not clear.The blocking failure chain I hit in this environment, in the order each problem fires during a single
pipeline.run(...):FabricCredentials.get_odbc_dsn_dict(dlt/destinations/impl/fabric/configuration.py:55-72) unconditionally emitsAUTHENTICATION=ActiveDirectoryServicePrincipaland readsUID/PWDfromazure_client_id/azure_client_secret. There is no code path for passing a pre-fetched AAD bearer token topyodbcviaSQL_COPT_SS_ACCESS_TOKEN (1256). Theon_partialfallback toDefaultAzureCredentialis also unusable in this environment — Microsoft Learn explicitly states thatDefaultAzureCredentialis not supported inside Fabric notebooks, and nothing plumbs aTokenCredentialtoken intopyodbc.connectanyway becauseFabricSqlClientinheritsPyOdbcMsSqlClient.open_connection, which callspyodbc.connect(dsn, timeout=...)with noattrs_beforehook.FabricCopyFileLoadJob._ensure_fabric_token_initialized(dlt/destinations/impl/fabric/fabric.py:62-139) is triggered wheneverstaging_credentialsis anAzureServicePrincipalCredentialsWithoutDefaultsinstance — which the config resolver picks any time theazure_*fields are populated, even with dummy strings. The helper builds a realClientSecretCredential(tenant_id, client_id, client_secret)from those fields and hitshttps://api.fabric.microsoft.com/.defaultbefore every load, failing withClientAuthenticationError: 'Forbidden'long before any data moves. The helper's own comment documents that it is a Fabric SP rate-limiting workaround, which is not needed under interactive user identity — the subsequentCOPY INTOfor OneLake does not emit aCREDENTIALclause and relies on the Fabric Warehouse's workspace identity.OneLake staging ignores Fabric's built-in credential provider when any credential is supplied (
dlt/common/configuration/specs/azure_credentials.py:118-125, 162-187+dlt/common/storages/fsspec_filesystem.py:95, 191). Fabric notebooks ship a customOnelakeFileSystem(infsspec_wrapper.trident.core) that is registered as theabfss://protocol handler. Its__init__calls a built-inmake_credential()helper when nocredentialkwarg is supplied, producing aPandasCredentialtied to the notebook user identity. This is the supported OneLake auth path for interactive execution. dlt's defaultAzureServicePrincipalCredentials.to_adlfs_credentialsreturns SP kwargs that authenticate as a broken SP in this case; passing an explicitTokenCredentialviacredential=works in direct adlfs preflight calls but still returns403inside dlt's full load path. The only shape that works end-to-end is to omitcredentialentirely and letOnelakeFileSystem.make_credential()run.FilesystemClient.dataset_pathtrailing slash (dlt/destinations/impl/filesystem/filesystem.py:587-591) forces the result to end in/viaself.pathlib.join(self.bucket_path, self.dataset_name, "").FilesystemClient.initialize_storagethen callsself.fs_client.isdir(self.dataset_path), which reachesadlfs._exists→BlobClient.exists(version_id=...)for a blob name ending in/. OneLake returns403 ClientAuthenticationErrorfor the invalid shape (other Azure Blob endpoints return404), and the load fails atstep=load. Reproduction, same fs instance and same path minus one character:fs.isdir("<ws>/<lh>/Files/_dlt_stage/demo/")→403,fs.isdir("<ws>/<lh>/Files/_dlt_stage/demo")→True.FilesystemClient.get_table_dirtrailing slash (dlt/destinations/impl/filesystem/filesystem.py:874-883) has the same bug one level deeper:table_dir = self.pathlib.dirname(table_prefix) + self.pathlib.sep.FilesystemClient.truncate_tables(line 721) iterates these paths and callsself.fs_client.exists(table_dir), hitting the same 403 on every table oncedataset_pathis fixed.Problems (4) and (5) are latent defects in
FilesystemClienttriggerable outside the Fabric case on any backend whereBlobClient.existsdistinguishesfoo/fromfooon the existence path — Fabric/OneLake just makes them visibly fatal because it responds403instead of404.Proposed solution
One PR that addresses all five problems together:
Fabric warehouse token auth. Add
access_token: Optional[TSecretStrValue]toFabricCredentials. When set,get_odbc_dsn_dictreturns onlyDRIVER/SERVER/DATABASE/Encrypt/TrustServerCertificate/LongAsMax(noAUTHENTICATION/UID/PWD), andFabricSqlClient.open_connectionis overridden to pack the token into the little-endian UTF-16 struct the ODBC driver expects and pass it viapyodbc.connect(dsn, attrs_before={1256: token_struct}, timeout=...). Inherited datetimeoffset output converter and autocommit behavior are preserved. SP path is untouched when noaccess_tokenis set.Gate the Fabric SP session warmup. Change
FabricCopyFileLoadJob._ensure_fabric_token_initializedto short-circuit when the credentials object lacks a real SP secret (e.g.credentials.azure_client_secretisNone/ empty, or the token-auth mode from (1) is active). Interactive identity loads skip the warmup and rely on the workspace identity for OneLake reads, the wayCOPY INTOalready expects.Notebook-identity OneLake staging. Teach
AzureCredentialsWithoutDefaults.to_adlfs_credentialsandAzureServicePrincipalCredentialsWithoutDefaults.to_adlfs_credentialsto return only{account_name, account_host}(nocredential, no SP fields) when the configuration is in notebook-identity mode and the bucket host is a*.fabric.microsoft.comendpoint. With nocredentialkwarg, the Fabric-registeredOnelakeFileSystem.__init__falls through to itsmake_credential()helper and uses the notebook user identity for every OneLake request.Strip trailing slash from
FilesystemClient.dataset_path. Change the property toself.pathlib.join(self.bucket_path, self.dataset_name). All other consumers usepathlib.join(dataset_path, ...)to append further segments, where a trailing slash is irrelevant.Strip trailing slash from
FilesystemClient.get_table_dir. Changetable_dir = self.pathlib.dirname(table_prefix) + self.pathlib.septotable_dir = self.pathlib.dirname(table_prefix). Same reasoning as (4).Validation accompanying the PR:
pyodbc/notebookutils/jwt/adlfs — covering the struct layout forSQL_COPT_SS_ACCESS_TOKEN, the DSN fields, theto_adlfs_credentialsoutput for all four azure credentials classes in the hierarchy (both with- and without-defaults variants), thedataset_path/get_table_dirshapes, and the warmup no-op.dlt.pipeline(destination="fabric", staging="filesystem")withmerge/scd2disposition, two sequential runs, verification query returns the expected three-row history (id=1unchanged active,id=2original version closed att2, new active version fromt2).develat commitfa1e403ef3368fca80354085e55e86e11b514e8e.Related issues
None found in a search of the
dlt-hub/dlttracker for "Fabric notebook", "OneLake 403", "SQL_COPT_SS_ACCESS_TOKEN", or "dataset_pathtrailing". Happy to cross-link anything maintainers point at.