Skip to content

bug: join on float("nan") behaves differently #3554

@FBruzzesi

Description

@FBruzzesi

Spotted in https://github.com/narwhals-dev/narwhals/actions/runs/24605500399/job/71950954205?pr=3552

Repro:

def test_inner_join_nan(constructor: Constructor) -> None:
    data = {
        "a": [0, 0, 0],
        "b": [0, 0, 0],
        "c": [0.0, 0.0, float("nan")],
    }
    join_cols = ["a", "c"]
    frame = from_native_lazy(constructor(data))

    result = frame.join(frame, on=join_cols, how="inner").sort("c", nulls_last=True).collect()

    zero_cols = ("a", "b", "b_right")
    for col in zero_cols:
        assert (result.get_column(col)==0).all()

    assert (result.get_column("c").is_nan().sum())==1
    # NOTE: polars results in the following:
    """
    expected = {
        "a": [0, 0, 0, 0, 0],
        "b": [0, 0, 0, 0, 0],
        "c": [0., 0., 0., 0., float("nan")],
        "b_right": [0, 0, 0, 0, 0],
    }
    How can we sort the data to use:
    assert_equal_data(result, expected)
    """

will result in failures for (any) pandas like, dask, and duckdb

$ pytest tests/frame/join_test.py -k inner_join_nan --all-cpu-constructors

FAILED tests/frame/join_test.py::test_inner_join_nan[pandas] - AssertionError: assert np.int64(0) == 1
FAILED tests/frame/join_test.py::test_inner_join_nan[pandas[nullable]] - AssertionError: assert np.int64(0) == 1
FAILED tests/frame/join_test.py::test_inner_join_nan[modin[pyarrow]] - AssertionError: assert 0 == 1
FAILED tests/frame/join_test.py::test_inner_join_nan[pandas[pyarrow]] - AssertionError: assert 0 == 1
FAILED tests/frame/join_test.py::test_inner_join_nan[duckdb] - AssertionError: assert 0 == 1
FAILED tests/frame/join_test.py::test_inner_join_nan[dask] - AssertionError: assert np.int64(0) == 1

I think for:

Metadata

Metadata

Assignees

No one assigned

    Labels

    bug: incorrect resultSomething isn't workingdaskIssue is related to dask backendduckdbIssue is related to duckdb backendpandas-likeIssue is related to pandas-like backends

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions