(feat) implement explicit joins with local and foreign datasets

followup: #3403
requires: #3746 (multischema datasets) and PR #3590 (magic joins)

## Introduction

PR #3590 introduced `Relation.join()` that auto-discovers join conditions from dlt schema references (parent/child relationships). This ticket extends it with an explicit `on` parameter, enabling:
- Explicit join conditions between same-dataset tables (no schema reference chain needed)
- Cross-dataset joins where `other` is a Relation from a different Dataset

A cross-dataset join introduces a foreign schema into the query. We use the local/foreign schema mechanism from #3746 to register it, making the foreign dataset's tables visible to the qualify and bind steps.

## Signature

```python
TJoinType = Literal["left", "right", "inner", "full"]


# Overload 1: Magic join — auto-discovers ON from schema references
@overload
def join(
    self,
    other: Union[str, Self],
    *,
    kind: TJoinType = "inner",
    alias: Optional[str] = None,
) -> Self: ...

# Overload 2: Explicit ON condition
@overload
def join(
    self,
    other: Union[str, Self],
    on: Union[str, sge.Expression],
    *,
    kind: TJoinType = "inner",
    alias: Optional[str] = None,
) -> Self: ...

# Implementation
def join(
    self,
    other: Union[str, Self],
    on: Union[str, sge.Expression, None] = None,
    *,
    kind: TJoinType = "inner",
    alias: Optional[str] = None,
) -> Self:
    """Join this relation with another table or relation.

    When ``on`` is omitted, join conditions are discovered automatically
    from the dlt schema's reference chain (parent/child relationships
    created during loading). Both sides must be base-table relations
    from the same dataset.

    When ``on`` is provided, any two relations can be joined — including
    relations from different datasets on the same physical destination
    (cross-dataset join). The ``on`` condition uses logical column names
    from the dlt schema.

    Args:
        other: Table name (str) or Relation to join with. When a Relation
            from a different Dataset is passed, ``on`` is required.
        on: Join condition as SQL string or SQLGlot expression. Disables
            auto-discovery from schema references when provided.
            Examples: ``"users.id = orders.user_id"``,
            ``sge.EQ(this=sge.column("id", "users"),
            expression=sge.column("user_id", "orders"))``
        kind: SQL join type — ``"inner"``, ``"left"``, ``"right"``,
            or ``"full"``. Defaults to ``"inner"``.
        alias: Column projection prefix for the joined table's columns.
            Columns from ``other`` appear as ``{alias}__{column}``.
            Defaults to the target table name.

    Returns:
        New Relation with the join applied and the joined table's
        columns appended to the projection.

    Raises:
        ValueError: If ``on`` is omitted and schema references between
            the two tables cannot be resolved, or if ``other`` is from
            a different dataset and ``on`` is not provided.

    Example:
        >>> # Magic join via schema references (same dataset)
        >>> users = dataset.table("users")
        >>> joined = users.join("users__orders", kind="left")

        >>> # Explicit join (same or cross-dataset)
        >>> customers = dataset1.table("customers")
        >>> orders = dataset2.table("orders")
        >>> joined = customers.join(
        ...     orders,
        ...     on="customers.customer_id = orders.customer_id",
        ...     kind="left",
        ... )
    """
```

## Requirements

### Explicit join behavior
- When `on` is provided: parse the string into a SQLGlot expression (if needed), build the JOIN node with the explicit ON condition. No schema reference lookup.
- When `on` is omitted: existing magic join behavior from PR #3590 (unchanged).
- Column projection uses the same `{alias}__{column}` convention as magic joins.

### Cross-dataset join via Relation
- When `other` is a `Relation` from a different `Dataset`, `on` is required.
- Automatically register the foreign schema: `self._dataset._add_foreign_schema(other._dataset.dataset_name, other._dataset.schema)`. This makes foreign tables visible to the qualify step.

### Dataset-qualified string in `other`
- Accept `"warehouse_data.users"` as `other` — parse into dataset_name + table_name.
- Validate that the dataset is known (local or already registered as foreign).
- If not known, raise `ValueError` with guidance to pass a `Relation` instead (so the schema can be auto-registered).
- We recommend users pass a `Relation` rather than a qualified string — it carries the schema needed for automatic registration.

## Query binder limitations for cross-dataset joins

Cross-dataset joins expose three limitations in the current qualify/bind pipeline that must be addressed:

### 1. `create_sqlglot_schema` is single-dataset

**File:** `dlt/dataset/lineage.py` — `create_sqlglot_schema()` (line 28)

Currently wraps all tables under a single `{dataset_name: {table: columns}}`. Tables from foreign datasets are invisible to `qualify()`, so column references to foreign tables cannot be resolved.

### 2. `bind_query` uses a single `expand_table_name` callback with no dataset context

**File:** `dlt/common/libs/sqlglot.py` — `bind_query()` (line 947)

The `expand_table_name` callback has signature `Callable[[str], List[str]]` — it receives only the table name. It is bound (via `functools.partial`) to the sql_client's own `dataset_name`. When `bind_query` encounters a foreign table node whose `node.db` differs from the local dataset, it still expands using the local dataset name — or skips expansion entirely if the table isn't in the SQLGlot schema.

### 3. `make_qualified_table_name_path` is hardcoded to `self.dataset_name`

**File:** `dlt/destinations/sql_client.py` — `make_qualified_table_name_path()` (line 226)

Always uses `self.dataset_name` (line 235). There is no way to override the dataset for a foreign table. Similarly, `catalog_name()` is per-sql_client, so if two datasets live in different catalogs the path cannot be constructed correctly.

## Tests

### Unit tests (`tests/dataset/`)
- Explicit `on` join between two tables in the same dataset
- Explicit `on` join with `other` as a Relation from a different Dataset (cross-dataset)
- `"dataset.table"` string parsing in `other`
- Foreign schema auto-registered when cross-dataset Relation is joined
- Column projection with explicit `alias`
- Error cases: `on` omitted with cross-dataset Relation, unknown dataset string

### Integration tests with duckdb
- Two pipelines loading to two datasets on same duckdb
- Cross-dataset join produces correct SQL and returns correct data

### Smoke tests on all destinations (CI matrix)
- Same-dataset explicit join works on each destination

### Cross-catalog tests (ducklake or similar)
These are expected to initially fail — the goal is to document binder limitations:
- Join across catalogs where dataset name on each catalog is different
- Join across catalogs where dataset name is the same on both catalogs (tests that the binder distinguishes by catalog, not just dataset_name)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

(feat) implement explicit joins with local and foreign datasets #3747

Introduction

Signature

Requirements

Explicit join behavior

Cross-dataset join via Relation

Dataset-qualified string in `other`

Query binder limitations for cross-dataset joins

1. `create_sqlglot_schema` is single-dataset

2. `bind_query` uses a single `expand_table_name` callback with no dataset context

3. `make_qualified_table_name_path` is hardcoded to `self.dataset_name`

Tests

Unit tests (`tests/dataset/`)

Integration tests with duckdb

Smoke tests on all destinations (CI matrix)

Cross-catalog tests (ducklake or similar)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

(feat) implement explicit joins with local and foreign datasets #3747

Description

Introduction

Signature

Requirements

Explicit join behavior

Cross-dataset join via Relation

Dataset-qualified string in other

Query binder limitations for cross-dataset joins

1. create_sqlglot_schema is single-dataset

2. bind_query uses a single expand_table_name callback with no dataset context

3. make_qualified_table_name_path is hardcoded to self.dataset_name

Tests

Unit tests (tests/dataset/)

Integration tests with duckdb

Smoke tests on all destinations (CI matrix)

Cross-catalog tests (ducklake or similar)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Dataset-qualified string in `other`

1. `create_sqlglot_schema` is single-dataset

2. `bind_query` uses a single `expand_table_name` callback with no dataset context

3. `make_qualified_table_name_path` is hardcoded to `self.dataset_name`

Unit tests (`tests/dataset/`)